MDPI - Publisher of Open Access Journals

17 pages, 1326 KB

Open AccessArticle

A New Estimator of Kullback–Leibler Divergence via Shannon Entropy

by Mehmet Sıddık Çadırcı and Martin Singull

Entropy 2026, 28(7), 720; https://doi.org/10.3390/e28070720 (registering DOI) - 24 Jun 2026

We examine the estimation of the Kullback–Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate normality. Our starting point is the maximum entropy principle for Shannon entropy: among all distributions with a fixed mean vector and covariance matrix, the multivariate [...] Read more.

We examine the estimation of the Kullback–Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate normality. Our starting point is the maximum entropy principle for Shannon entropy: among all distributions with a fixed mean vector and covariance matrix, the multivariate Gaussian distributions uniquely maximize entropy. As a result, the KL divergence from a moment-matched Gaussian distribution to an unknown density can then be written as the entropy difference, which is a suitable information-theoretic measure of divergence from the Gaussian distribution. To estimate, we use k-nearest neighbor (kNN) estimators based on Shannon entropy and KL divergence derived from the Kozachenko–Leonenko approach and subsequent improvements, along with the consistency and

L^{2}

-convergence results established for these estimators. Motivated by previous entropy-based goodness-of-fit ideas developed for Rényi-type functionals for generalized Gaussian and Student-type models, we describe a KL-based test statistic as being the difference between the entropy of a Gaussian model fitted to the sample mean and covariance and the KL divergence between the unknown entropy and the kNN estimate. The statistic converges to zero for multivariate normality and converges to a strictly positive bound with non-Gaussian alternatives. The results of Monte Carlo simulations conducted across various dimensions and sample sizes indicate that the proposed method provides accurate Type I error control among the alternatives considered and demonstrates promising empirical power. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

17 pages, 3310 KB

Open AccessArticle

Research on an Adaptive Selection Method for GNSS Signals in Passive Radar

by Hongwei Fu, Hao Cha, Yu Luo, Tingting Fu, Bin Tian and Huatao Tang

Electronics 2026, 15(3), 648; https://doi.org/10.3390/electronics15030648 - 2 Feb 2026

Viewed by 576

Abstract

Limited computational resources prevent GNSS-based passive radar systems from processing all accessible signals, necessitating intelligent signal selection for efficient target tracking. This paper proposes an adaptive selection method based on Rényi divergence. Within the Cardinality Balanced Multi-Bernoulli (CBMeMBer) filter framework, the method establishes [...] Read more.

Limited computational resources prevent GNSS-based passive radar systems from processing all accessible signals, necessitating intelligent signal selection for efficient target tracking. This paper proposes an adaptive selection method based on Rényi divergence. Within the Cardinality Balanced Multi-Bernoulli (CBMeMBer) filter framework, the method establishes an optimization model that maximizes the expected information gain under a fixed signal-number constraint. To comprehensively validate performance, simulations are conducted under three scenarios: multi-target linear motion, single-target tracking (for comparison with the classical Geometric Dilution of Precision (GDOP) criterion), and multi-target nonlinear maneuvering. Results demonstrate that the proposed algorithm significantly reduces computational load while achieving tracking accuracy superior to random selection and comparable to using all satellites. Compared to the GDOP-based method, it exhibits improved steady-state tracking accuracy by leveraging its dynamic, information-driven selection mechanism. This work provides an effective solution for intelligent resource management in resource-constrained GNSS-based passive radar systems. Full article

(This article belongs to the Special Issue Advances in Radar Signal Processing Technology and Its Application)

► Show Figures

Graphical abstract

14 pages, 466 KB

Open AccessReview

Density Functional Theory and Information-Theoretic Diagnostics of Quantum Phase Transitions

by Elvira Romera and Ágnes Nagy

Entropy 2026, 28(2), 170; https://doi.org/10.3390/e28020170 - 1 Feb 2026

Viewed by 454

Abstract

Within density functional theory (DFT), where the density is the fundamental variable, quantum phase transitions (QPTs) can be formulated through a Hamiltonian

\hat{H} = {\hat{H}}_{0} + \sum_{i} ξ_{i} {\hat{A}}_{i}

, such that the control parameters [...] Read more.

Within density functional theory (DFT), where the density is the fundamental variable, quantum phase transitions (QPTs) can be formulated through a Hamiltonian

\hat{H} = {\hat{H}}_{0} + \sum_{i} ξ_{i} {\hat{A}}_{i}

, such that the control parameters

{ξ_{i}}

are in bijective correspondence (in the nondegenerate case) with the “densities”

a_{i} = ⟨ {\hat{A}}_{i} ⟩

, and the functional

Q ({a_{i}})

acts as the Legendre transform of the energy; this structure even permits the use of Rényi entropy (for a given order) as an alternative control parameter, while degeneracy can be handled via a subspace density. On this foundation, information-theoretic measures provide sensitive diagnostics of criticality: fidelity and its susceptibility

χ

, Fisher information, relative Rényi entropy, and the Kullback–Leibler divergence are locally linked by

R^{q} \approx q I_{K L} \approx 2 q χ {(δ λ)}^{2}

, revealing their proportionality in the small-parameter-shift regime. Applied to the Dicke model, numerical analyses show that fidelity exhibits pronounced curvature or divergence near

λ_{c} = \sqrt{ω ω_{0}} / 2

and that the response sharpens with increasing j, corroborating that these information measures capture QPTs with precision within the DFT framework. Full article

(This article belongs to the Special Issue The Information-Theoretic Approach in Density Functional Theory and Beyond)

► Show Figures

Figure 1

21 pages, 342 KB

Open AccessArticle

Strongly F-Convex Functions with Structural Characterizations and Applications in Entropies

by Hasan Barsam, Slavica Ivelić Bradanović, Matea Jelić and Yamin Sayyari

Axioms 2025, 14(12), 926; https://doi.org/10.3390/axioms14120926 - 16 Dec 2025

Viewed by 804

Abstract

Strongly convex functions form a central subclass of convex functions and have gained considerable attention due to their structural advantages and broad applicability, particularly in optimization and information theory. In this paper, we investigate the class of strongly F-convex functions, which generalizes [...] Read more.

Strongly convex functions form a central subclass of convex functions and have gained considerable attention due to their structural advantages and broad applicability, particularly in optimization and information theory. In this paper, we investigate the class of strongly F-convex functions, which generalizes the classical notion of strong convexity by introducing an auxiliary convex control function F. We establish several fundamental structural characterizations of this class and provide a variety of nontrivial examples such as power, logarithmic, and exponential functions. In addition, we derive refined Jensen-type and Hermite–Hadamard-type inequalities adapted to the strongly F-convex concept, thereby extending and sharpening their classical forms. As applications, we obtain new analytical inequalities and improved error bounds for entropy-related quantities, including Shannon, Tsallis, and Rényi entropies, demonstrating that the concept of strong F-convexity naturally yields strengthened divergence and uncertainty estimates. Full article

(This article belongs to the Special Issue Advances in Functional Analysis and Banach Space)

29 pages, 8538 KB

Open AccessArticle

A Hierarchical Adaptive Moment Matching Multiple Model Tracking Method for Hypersonic Glide Target Under Measurement Uncertainty

by Hanxing Shao, Jibin Zheng, Yanwen Bai, Hongwei Liu, Ye Ge and Boyang Liu

Sensors 2025, 25(21), 6621; https://doi.org/10.3390/s25216621 - 28 Oct 2025

Viewed by 1070

Abstract

Hypersonic glide targets (HGTs) pose significant challenges for radar tracking due to complex maneuver strategies and time-varying statistics of measurement noise. Conventional single-model tracking methods are generally insufficient to fully capture maneuver modes, while existing multiple-model methods face trade-offs between model set completeness [...] Read more.

Hypersonic glide targets (HGTs) pose significant challenges for radar tracking due to complex maneuver strategies and time-varying statistics of measurement noise. Conventional single-model tracking methods are generally insufficient to fully capture maneuver modes, while existing multiple-model methods face trade-offs between model set completeness and computational efficiency. In addition, existing tracking methods struggle to cope with the non-Gaussian noise during hypersonic flight. To overcome these limitations, a Hierarchical Adaptive Moment Matching (HAMM) multiple-model method is proposed in this paper. Firstly, a comprehensive model set is constructed to cover characteristic maneuver modes. Subsequently, a hierarchical multiple-model framework is developed where: (1) a coarse model set is dynamically adapted by multi-frame posterior probability evolution and Rényi divergence criteria; (2) a fine model set is generated based on the moment matching method. Furthermore, the minimum error entropy cubature Kalman filter (MEECKF) is proposed to suppress the non-Gaussian measurement noise with high stability. Monte Carlo simulations demonstrate that the proposed method achieves improved positioning accuracy and faster convergence. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

33 pages, 1945 KB

Open AccessArticle

A Novel Distributed Hybrid Cognitive Strategy for Odor Source Location in Turbulent and Sparse Environment

by Yingmiao Jia, Shurui Fan, Weijia Cui, Chengliang Di and Yafeng Hao

Entropy 2025, 27(8), 826; https://doi.org/10.3390/e27080826 - 4 Aug 2025

Cited by 2 | Viewed by 1307

Abstract

Precise odor source localization in turbulent and sparse environments plays a vital role in enabling robotic systems for hazardous chemical monitoring and effective disaster response. To address this, we propose Cooperative Gravitational-Rényi Infotaxis (CGRInfotaxis), a distributed decision-optimization framework that combines multi-agent collaboration with [...] Read more.

Precise odor source localization in turbulent and sparse environments plays a vital role in enabling robotic systems for hazardous chemical monitoring and effective disaster response. To address this, we propose Cooperative Gravitational-Rényi Infotaxis (CGRInfotaxis), a distributed decision-optimization framework that combines multi-agent collaboration with hybrid cognitive strategy to improve search efficiency and robustness. The method integrates a gravitational potential field for rapid source convergence and Rényi divergence-based probabilistic exploration to handle sparse detections, dynamically balanced via a regulation factor. Particle filtering optimizes posterior probability estimation to autonomously refine search areas while preserving computational efficiency, alongside a distributed interactive-optimization mechanism for real-time decision updates through agent cooperation. The algorithm’s performance is evaluated in scenarios with fixed and randomized odor source locations, as well as with varying numbers of agents. Results demonstrate that CGRInfotaxis achieves a near-100% success rate with high consistency across diverse conditions, outperforming existing methods in stability and adaptability. Increasing the number of agents further enhances search efficiency without compromising reliability. These findings suggest that CGRInfotaxis significantly advances multi-agent odor source localization in turbulent, sparse environments, offering practical utility for real-world applications. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

26 pages, 543 KB

Open AccessArticle

Bounds on the Excess Minimum Risk via Generalized Information Divergence Measures

by Ananya Omanwar, Fady Alajaji and Tamás Linder

Entropy 2025, 27(7), 727; https://doi.org/10.3390/e27070727 - 5 Jul 2025

Cited by 1 | Viewed by 965

Abstract

Given finite-dimensional random vectors Y, X, and Z that form a Markov chain in that order (

Y \to X \to Z

), we derive the upper bounds on the excess minimum risk using generalized information divergence measures. Here, Y is [...] Read more.

Given finite-dimensional random vectors Y, X, and Z that form a Markov chain in that order (

Y \to X \to Z

), we derive the upper bounds on the excess minimum risk using generalized information divergence measures. Here, Y is a target vector to be estimated from an observed feature vector X or its stochastically degraded version Z. The excess minimum risk is defined as the difference between the minimum expected loss in estimating Y from X and from Z. We present a family of bounds that generalize a prior bound based on mutual information, using the Rényi and

α

-Jensen–Shannon divergences, as well as Sibson’s mutual information. Our bounds are similar to recently developed bounds for the generalization error of learning algorithms. However, unlike these works, our bounds do not require the sub-Gaussian parameter to be constant, and therefore, apply to a broader class of joint distributions over Y, X, and Z. We also provide numerical examples under both constant and non-constant sub-Gaussianity assumptions, illustrating that our generalized divergence-based bounds can be tighter than the ones based on mutual information for certain regimes of the parameter

α

. Full article

(This article belongs to the Special Issue Information Theoretic Learning with Its Applications)

► Show Figures

Figure 1

31 pages, 70417 KB

Open AccessArticle

Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders

by Yubo Wang and Gaofeng Zhang

Electronics 2025, 14(11), 2185; https://doi.org/10.3390/electronics14112185 - 28 May 2025

Viewed by 3038

Abstract

Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging [...] Read more.

Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging CLIP embeddings for semantic guidance and enabling explicit attribute control, thereby reducing computational load and data dependency. Key to our approach is a specialized mapping network that bridges CLIP text–image modalities for improved fidelity and Rényi divergence for latent space regularization to foster diversity, as evidenced by richer latent representations. Experiments on CelebA demonstrate competitive generation (FID: 40.53, 42 M params, 21 FPS) with enhanced diversity. Crucially, our model also shows effective generalization to the more complex MS COCO dataset and maintains a favorable balance between visual quality and efficiency (8 FPS at 256 × 256 resolution with 54 M params). Ablation studies and component validations (detailed in appendices) confirm the efficacy of our contributions. This work offers a practical, efficient T2I solution that balances generative performance with resource constraints across different datasets and is suitable for specialized, data-limited domains. Full article

(This article belongs to the Special Issue Big Model Techniques for Image Processing)

► Show Figures

Figure 1

9 pages, 589 KB

Open AccessFeature PaperArticle

Axiomatic Approach to Measures of Total Correlations

by Gabriel L. Moraes, Renato M. Angelo and Ana C. S. Costa

Entropy 2024, 26(12), 1098; https://doi.org/10.3390/e26121098 - 15 Dec 2024

Cited by 1 | Viewed by 1576

Abstract

Correlations play a pivotal role in various fields of science, particularly in quantum mechanics, yet their proper quantification remains a subject of debate. In this work, we aimed to discuss the challenge of defining a reliable measure of total correlations. We first outlined [...] Read more.

Correlations play a pivotal role in various fields of science, particularly in quantum mechanics, yet their proper quantification remains a subject of debate. In this work, we aimed to discuss the challenge of defining a reliable measure of total correlations. We first outlined the essential properties that an effective correlation measure should satisfy and reviewed existing measures, including quantum mutual information, the p-norm of the correlation matrix, and the recently defined quantum Pearson correlation coefficient. Additionally, we introduced new measures based on Rényi and Tsallis relative entropies, as well as the Kullback–Leibler divergence. Our analysis revealed that while quantum mutual information, the p-norm, and the Pearson measure exhibit equivalence for two-qubit systems, they all suffer from an ordering problem. Despite criticisms regarding its reliability, we argued that QMI remains a valid measure of total correlations. Full article

(This article belongs to the Section Quantum Information)

► Show Figures

Figure 1

34 pages, 574 KB

Open AccessArticle

Optimum Achievable Rates in Two Random Number Generation Problems with f-Divergences Using Smooth Rényi Entropy

by Ryo Nomura and Hideki Yagi

Entropy 2024, 26(9), 766; https://doi.org/10.3390/e26090766 - 6 Sep 2024

Cited by 4 | Viewed by 1475

Abstract

Two typical fixed-length random number generation problems in information theory are considered for general sources. One is the source resolvability problem and the other is the intrinsic randomness problem. In each of these problems, the optimum achievable rate with respect to the given [...] Read more.

Two typical fixed-length random number generation problems in information theory are considered for general sources. One is the source resolvability problem and the other is the intrinsic randomness problem. In each of these problems, the optimum achievable rate with respect to the given approximation measure is one of our main concerns and has been characterized using two different information quantities: the information spectrum and the smooth Rényi entropy. Recently, optimum achievable rates with respect to f-divergences have been characterized using the information spectrum quantity. The f-divergence is a general non-negative measure between two probability distributions on the basis of a convex function f. The class of f-divergences includes several important measures such as the variational distance, the KL divergence, the Hellinger distance and so on. Hence, it is meaningful to consider the random number generation problems with respect to f-divergences. However, optimum achievable rates with respect to f-divergences using the smooth Rényi entropy have not been clarified yet in both problems. In this paper, we try to analyze the optimum achievable rates using the smooth Rényi entropy and to extend the class of f-divergence. To do so, we first derive general formulas of the first-order optimum achievable rates with respect to f-divergences in both problems under the same conditions as imposed by previous studies. Next, we relax the conditions on f-divergence and generalize the obtained general formulas. Then, we particularize our general formulas to several specified functions f. As a result, we reveal that it is easy to derive optimum achievable rates for several important measures from our general formulas. Furthermore, a kind of duality between the resolvability and the intrinsic randomness is revealed in terms of the smooth Rényi entropy. Second-order optimum achievable rates and optimistic achievable rates are also investigated. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

32 pages, 414 KB

Open AccessFeature PaperArticle

Statistical Divergence and Paths Thereof to Socioeconomic Inequality and to Renewal Processes

by Iddo Eliazar

Entropy 2024, 26(7), 565; https://doi.org/10.3390/e26070565 - 30 Jun 2024

Viewed by 1583

Abstract

This paper establishes a general framework for measuring statistical divergence. Namely, with regard to a pair of random variables that share a common range of values: quantifying the distance of the statistical distribution of one random variable from that of the other. The [...] Read more.

This paper establishes a general framework for measuring statistical divergence. Namely, with regard to a pair of random variables that share a common range of values: quantifying the distance of the statistical distribution of one random variable from that of the other. The general framework is then applied to the topics of socioeconomic inequality and renewal processes. The general framework and its applications are shown to yield and to relate to the following: f-divergence, Hellinger divergence, Renyi divergence, and Kullback–Leibler divergence (also known as relative entropy); the Lorenz curve and socioeconomic inequality indices; the Gini index and its generalizations; the divergence of renewal processes from the Poisson process; and the divergence of anomalous relaxation from regular relaxation. Presenting a ‘fresh’ perspective on statistical divergence, this paper offers its readers a simple and transparent construction of statistical-divergence gauges, as well as novel paths that lead from statistical divergence to the aforementioned topics. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

41 pages, 536 KB

Open AccessArticle

Mechanisms for Robust Local Differential Privacy

by Milan Lopuhaä-Zwakenberg and Jasper Goseling

Entropy 2024, 26(3), 233; https://doi.org/10.3390/e26030233 - 6 Mar 2024

Cited by 5 | Viewed by 4403

Abstract

We consider privacy mechanisms for releasing data

X = (S, U)

, where S is sensitive and U is non-sensitive. We introduce the robust local differential privacy (RLDP) framework, which provides strong privacy guarantees, while preserving utility. This is achieved [...] Read more.

We consider privacy mechanisms for releasing data

X = (S, U)

, where S is sensitive and U is non-sensitive. We introduce the robust local differential privacy (RLDP) framework, which provides strong privacy guarantees, while preserving utility. This is achieved by providing robust privacy: our mechanisms do not only provide privacy with respect to a publicly available estimate of the unknown true distribution, but also with respect to similar distributions. Such robustness mitigates the potential privacy leaks that might arise from the difference between the true distribution and the estimated one. At the same time, we mitigate the utility penalties that come with ordinary differential privacy, which involves making worst-case assumptions and dealing with extreme cases. We achieve robustness in privacy by constructing an uncertainty set based on a Rényi divergence. By analyzing the structure of this set and approximating it with a polytope, we can use robust optimization to find mechanisms with high utility. However, this relies on vertex enumeration and becomes computationally inaccessible for large input spaces. Therefore, we also introduce two low-complexity algorithms that build on existing LDP mechanisms. We evaluate the utility and robustness of the mechanisms using numerical experiments and demonstrate that our mechanisms provide robust privacy, while achieving a utility that is close to optimal. Full article

(This article belongs to the Special Issue Information Theory for Distributed Systems)

► Show Figures

Figure 1

16 pages, 656 KB

Open AccessFeature PaperArticle

Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity

by Frank Nielsen

Entropy 2024, 26(3), 193; https://doi.org/10.3390/e26030193 - 23 Feb 2024

Cited by 2 | Viewed by 3130

Abstract

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the [...] Read more.

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the cumulant and partition functions are strictly convex and smooth functions inducing corresponding pairs of Bregman and Jensen divergences. It is well known that skewed Bhattacharyya distances between the probability densities of an exponential family amount to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and that in limit cases the sided Kullback–Leibler divergences amount to reverse-sided Bregman divergences. In this work, we first show that the

α

-divergences between non-normalized densities of an exponential family amount to scaled

α

-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetical means allows both convex functions and their arguments to be deformed, thereby defining dually flat spaces with corresponding divergences when ordinary convexity is preserved. Full article

► Show Figures

Figure 1

32 pages, 660 KB

Open AccessFeature PaperEditor’s ChoiceArticle

Smoothing of Binary Codes, Uniform Distributions, and Applications

by Madhura Pathegama and Alexander Barg

Entropy 2023, 25(11), 1515; https://doi.org/10.3390/e25111515 - 5 Nov 2023

Cited by 6 | Viewed by 3341

Abstract

The action of a noise operator on a code transforms it into a distribution on the respective space. Some common examples from information theory include Bernoulli noise acting on a code in the Hamming space and Gaussian noise acting on a lattice in [...] Read more.

The action of a noise operator on a code transforms it into a distribution on the respective space. Some common examples from information theory include Bernoulli noise acting on a code in the Hamming space and Gaussian noise acting on a lattice in the Euclidean space. We aim to characterize the cases when the output distribution is close to the uniform distribution on the space, as measured by the Rényi divergence of order

α \in (1, \infty]

. A version of this question is known as the channel resolvability problem in information theory, and it has implications for security guarantees in wiretap channels, error correction, discrepancy, worst-to-average case complexity reductions, and many other problems. Our work quantifies the requirements for asymptotic uniformity (perfect smoothing) and identifies explicit code families that achieve it under the action of the Bernoulli and ball noise operators on the code. We derive expressions for the minimum rate of codes required to attain asymptotically perfect smoothing. In proving our results, we leverage recent results from harmonic analysis of functions on the Hamming space. Another result pertains to the use of code families in Wyner’s transmission scheme on the binary wiretap channel. We identify explicit families that guarantee strong secrecy when applied in this scheme, showing that nested Reed–Muller codes can transmit messages reliably and securely over a binary symmetric wiretap channel with a positive rate. Finally, we establish a connection between smoothing and error correction in the binary symmetric channel. Full article

(This article belongs to the Special Issue Extremal and Additive Combinatorial Aspects in Information Theory)

► Show Figures

Figure 1

29 pages, 1775 KB

Open AccessFeature PaperArticle

Variational Inference via Rényi Bound Optimization and Multiple-Source Adaptation

by Dana Zalman (Oshri) and Shai Fine

Entropy 2023, 25(10), 1468; https://doi.org/10.3390/e25101468 - 20 Oct 2023

Cited by 1 | Viewed by 2913

Abstract

Variational inference provides a way to approximate probability densities through optimization. It does so by optimizing an upper or a lower bound of the likelihood of the observed data (the evidence). The classic variational inference approach suggests maximizing the Evidence Lower Bound (ELBO). [...] Read more.

Variational inference provides a way to approximate probability densities through optimization. It does so by optimizing an upper or a lower bound of the likelihood of the observed data (the evidence). The classic variational inference approach suggests maximizing the Evidence Lower Bound (ELBO). Recent studies proposed to optimize the variational Rényi bound (VR) and the

χ

upper bound. However, these estimates, which are based on the Monte Carlo (MC) approximation, either underestimate the bound or exhibit a high variance. In this work, we introduce a new upper bound, termed the Variational Rényi Log Upper bound (VRLU), which is based on the existing VR bound. In contrast to the existing VR bound, the MC approximation of the VRLU bound maintains the upper bound property. Furthermore, we devise a (sandwiched) upper–lower bound variational inference method, termed the Variational Rényi Sandwich (VRS), to jointly optimize the upper and lower bounds. We present a set of experiments, designed to evaluate the new VRLU bound and to compare the VRS method with the classic Variational Autoencoder (VAE) and the VR methods. Next, we apply the VRS approximation to the Multiple-Source Adaptation problem (MSA). MSA is a real-world scenario where data are collected from multiple sources that differ from one another by their probability distribution over the input space. The main aim is to combine fairly accurate predictive models from these sources and create an accurate model for new, mixed target domains. However, many domain adaptation methods assume prior knowledge of the data distribution in the source domains. In this work, we apply the suggested VRS density estimate to the Multiple-Source Adaptation problem (MSA) and show, both theoretically and empirically, that it provides tighter error bounds and improved performance, compared to leading MSA methods. Full article

(This article belongs to the Special Issue Entropy: The Cornerstone of Machine Learning)

► Show Figures

Figure 1

Search Results (70)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (70)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI