Entropy

Research

30 pages, 4167 KiB

Open AccessArticle

Combinatorial Optimization with Information Geometry: The Newton Method

by Luigi Malagò and Giovanni Pistone

Entropy 2014, 16(8), 4260-4289; https://doi.org/10.3390/e16084260 - 28 Jul 2014

Cited by 11 | Viewed by 7667

We discuss the use of the Newton method in the computation of max(p → Ε_p[f]), where p belongs to a statistical exponential family on a finite state space. In a number of papers, the authors have applied first order search methods [...] Read more.

We discuss the use of the Newton method in the computation of max(p → Ε_p[f]), where p belongs to a statistical exponential family on a finite state space. In a number of papers, the authors have applied first order search methods based on information geometry. Second order methods have been widely used in optimization on manifolds, e.g., matrix manifolds, but appear to be new in statistical manifolds. These methods require the computation of the Riemannian Hessian in a statistical manifold. We use a non-parametric formulation of information geometry in view of further applications in the continuous state space cases, where the construction of a proper Riemannian structure is still an open problem. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

36 pages, 1740 KiB

Open AccessArticle

Network Decomposition and Complexity Measures: An Information Geometrical Approach

by Masatoshi Funabashi

Entropy 2014, 16(7), 4132-4167; https://doi.org/10.3390/e16074132 - 23 Jul 2014

Cited by 6 | Viewed by 6388

Abstract

We consider the graph representation of the stochastic model with n binary variables, and develop an information theoretical framework to measure the degree of statistical association existing between subsystems as well as the ones represented by each edge of the graph representation. Besides, [...] Read more.

We consider the graph representation of the stochastic model with n binary variables, and develop an information theoretical framework to measure the degree of statistical association existing between subsystems as well as the ones represented by each edge of the graph representation. Besides, we consider the novel measures of complexity with respect to the system decompositionability, by introducing the geometric product of Kullback–Leibler (KL-) divergence. The novel complexity measures satisfy the boundary condition of vanishing at the limit of completely random and ordered state, and also with the existence of independent subsystem of any size. Such complexity measures based on the geometric means are relevant to the heterogeneity of dependencies between subsystems, and the amount of information propagation shared entirely in the system. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

Graphical abstract

13 pages, 262 KiB

Open AccessArticle

Using Geometry to Select One Dimensional Exponential Families That Are Monotone Likelihood Ratio in the Sample Space, Are Weakly Unimodal and Can Be Parametrized by a Measure of Central Tendency

by Paul Vos and Karim Anaya-Izquierdo

Entropy 2014, 16(7), 4088-4100; https://doi.org/10.3390/e16074088 - 18 Jul 2014

Cited by 1 | Viewed by 5732

Abstract

One dimensional exponential families on finite sample spaces are studied using the geometry of the simplex Δ_n°-1 and that of a transformation V_n-1 of its interior. This transformation is the natural parameter space associated with the family of multinomial [...] Read more.

One dimensional exponential families on finite sample spaces are studied using the geometry of the simplex Δ_n°-1 and that of a transformation V_n-1 of its interior. This transformation is the natural parameter space associated with the family of multinomial distributions. The space V_n-1 is partitioned into cones that are used to find one dimensional families with desirable properties for modeling and inference. These properties include the availability of uniformly most powerful tests and estimators that exhibit optimal properties in terms of variability and unbiasedness. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

17 pages, 2418 KiB

Open AccessArticle

New Riemannian Priors on the Univariate Normal Model

by Salem Said, Lionel Bombrun and Yannick Berthoumieu

Entropy 2014, 16(7), 4015-4031; https://doi.org/10.3390/e16074015 - 17 Jul 2014

Cited by 18 | Viewed by 5967

Abstract

The current paper introduces new prior distributions on the univariate normal model, with the aim of applying them to the classification of univariate normal populations. These new prior distributions are entirely based on the Riemannian geometry of the univariate normal model, so that [...] Read more.

The current paper introduces new prior distributions on the univariate normal model, with the aim of applying them to the classification of univariate normal populations. These new prior distributions are entirely based on the Riemannian geometry of the univariate normal model, so that they can be thought of as “Riemannian priors”. Precisely, if {pθ ; θ ∈ Θ} is any parametrization of the univariate normal model, the paper considers prior distributions G(

\bar{θ}

, γ) with hyperparameters

\bar{θ}

∈ Θ and γ > 0, whose density with respect to Riemannian volume is proportional to exp(−d2(θ, $\bar{θ}$ )/2γ²), where d²(θ,

\bar{θ}

) is the square of Rao’s Riemannian distance. The distributions G(

\bar{θ}

, γ) are termed Gaussian distributions on the univariate normal model. The motivation for considering a distribution G(

\bar{θ}

, γ) is that this distribution gives a geometric representation of a class or cluster of univariate normal populations. Indeed, G(

\bar{θ}

, γ) has a unique mode

\bar{θ}

(precisely,

\bar{θ}

is the unique Riemannian center of mass of G(

\bar{θ}

, γ), as shown in the paper), and its dispersion away from

\bar{θ}

is given by γ. Therefore, one thinks of members of the class represented by G(

\bar{θ}

, γ) as being centered around

\bar{θ}

and lying within a typical distance determined by γ. The paper defines rigorously the Gaussian distributions G(

\bar{θ}

, γ) and describes an algorithm for computing maximum likelihood estimates of their hyperparameters. Based on this algorithm and on the Laplace approximation, it describes how the distributions G(

\bar{θ}

, γ) can be used as prior distributions for Bayesian classification of large univariate normal populations. In a concrete application to texture image classification, it is shown that this leads to an improvement in performance over the use of conjugate priors. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

11 pages, 215 KiB

Open AccessArticle

The Entropy-Based Quantum Metric

by Roger Balian

Entropy 2014, 16(7), 3878-3888; https://doi.org/10.3390/e16073878 - 15 Jul 2014

Cited by 24 | Viewed by 6787

Abstract

The von Neumann entropy S( $\hat{D}$ ) generates in the space of quantum density matrices

\hat{D}

the Riemannian metric ds² = −d²S( $\hat{D}$ ), which is physically founded and which characterises the amount of quantum information [...] Read more.

The von Neumann entropy S( $\hat{D}$ ) generates in the space of quantum density matrices

\hat{D}

the Riemannian metric ds² = −d²S( $\hat{D}$ ), which is physically founded and which characterises the amount of quantum information lost by mixing

\hat{D}

and

\hat{D}

+ d $\hat{D}$ . A rich geometric structure is thereby implemented in quantum mechanics. It includes a canonical mapping between the spaces of states and of observables, which involves the Legendre transform of S( $\hat{D}$ ). The Kubo scalar product is recovered within the space of observables. Applications are given to equilibrium and non equilibrium quantum statistical mechanics. There the formalism is specialised to the relevant space of observables and to the associated reduced states issued from the maximum entropy criterion, which result from the exact states through an orthogonal projection. Von Neumann’s entropy specialises into a relevant entropy. Comparison is made with other metrics. The Riemannian properties of the metric ds² = −d²S( $\hat{D}$ ) are derived. The curvature arises from the non-Abelian nature of quantum mechanics; its general expression and its explicit form for q-bits are given, as well as geodesics. Full article

(This article belongs to the Special Issue Information Geometry)

16 pages, 743 KiB

Open AccessArticle

Variational Bayes for Regime-Switching Log-Normal Models

by Hui Zhao and Paul Marriott

Entropy 2014, 16(7), 3832-3847; https://doi.org/10.3390/e16073832 - 14 Jul 2014

Cited by 3 | Viewed by 5577

Abstract

The power of projection using divergence functions is a major theme in information geometry. One version of this is the variational Bayes (VB) method. This paper looks at VB in the context of other projection-based methods in information geometry. It also describes how [...] Read more.

The power of projection using divergence functions is a major theme in information geometry. One version of this is the variational Bayes (VB) method. This paper looks at VB in the context of other projection-based methods in information geometry. It also describes how to apply VB to the regime-switching log-normal model and how it provides a computationally fast solution to quantify the uncertainty in the model specification. The results show that the method can recover exactly the model structure, gives the reasonable point estimates and is very computationally efficient. The potential problems of the method in quantifying the parameter uncertainty are discussed. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

19 pages, 397 KiB

Open AccessArticle

Extending the Extreme Physical Information to Universal Cognitive Models via a Confident Information First Principle

by Xiaozhao Zhao, Yuexian Hou, Dawei Song and Wenjie Li

Entropy 2014, 16(7), 3670-3688; https://doi.org/10.3390/e16073670 - 1 Jul 2014

Cited by 4 | Viewed by 6784

Abstract

The principle of extreme physical information (EPI) can be used to derive many known laws and distributions in theoretical physics by extremizing the physical information loss K, i.e., the difference between the observed Fisher information I and the intrinsic information bound J of [...] Read more.

The principle of extreme physical information (EPI) can be used to derive many known laws and distributions in theoretical physics by extremizing the physical information loss K, i.e., the difference between the observed Fisher information I and the intrinsic information bound J of the physical phenomenon being measured. However, for complex cognitive systems of high dimensionality (e.g., human language processing and image recognition), the information bound J could be excessively larger than I (J ≫ I), due to insufficient observation, which would lead to serious over-fitting problems in the derivation of cognitive models. Moreover, there is a lack of an established exact invariance principle that gives rise to the bound information in universal cognitive systems. This limits the direct application of EPI. To narrow down the gap between I and J, in this paper, we propose a confident-information-first (CIF) principle to lower the information bound J by preserving confident parameters and ruling out unreliable or noisy parameters in the probability density function being measured. The confidence of each parameter can be assessed by its contribution to the expected Fisher information distance between the physical phenomenon and its observations. In addition, given a specific parametric representation, this contribution can often be directly assessed by the Fisher information, which establishes a connection with the inverse variance of any unbiased estimate for the parameter via the Cramér–Rao bound. We then consider the dimensionality reduction in the parameter spaces of binary multivariate distributions. We show that the single-layer Boltzmann machine without hidden units (SBM) can be derived using the CIF principle. An illustrative experiment is conducted to show how the CIF principle improves the density estimation performance. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

29 pages, 354 KiB

Open AccessArticle

On Clustering Histograms with k-Means by Using Mixed α-Divergences

by Frank Nielsen, Richard Nock and Shun-ichi Amari

Entropy 2014, 16(6), 3273-3301; https://doi.org/10.3390/e16063273 - 17 Jun 2014

Cited by 24 | Viewed by 10301

Abstract

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, [...] Read more.

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α -divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

Graphical abstract

27 pages, 354 KiB

Open AccessArticle

On the Fisher Metric of Conditional Probability Polytopes

by Guido Montúfar, Johannes Rauh and Nihat Ay

Entropy 2014, 16(6), 3207-3233; https://doi.org/10.3390/e16063207 - 6 Jun 2014

Cited by 10 | Viewed by 8073

Abstract

We consider three different approaches to define natural Riemannian metrics on polytopes of stochastic matrices. First, we define a natural class of stochastic maps between these polytopes and give a metric characterization of Chentsov type in terms of invariance with respect to these [...] Read more.

We consider three different approaches to define natural Riemannian metrics on polytopes of stochastic matrices. First, we define a natural class of stochastic maps between these polytopes and give a metric characterization of Chentsov type in terms of invariance with respect to these maps. Second, we consider the Fisher metric defined on arbitrary polytopes through their embeddings as exponential families in the probability simplex. We show that these metrics can also be characterized by an invariance principle with respect to morphisms of exponential families. Third, we consider the Fisher metric resulting from embedding the polytope of stochastic matrices in a simplex of joint distributions by specifying a marginal distribution. All three approaches result in slight variations of products of Fisher metrics. This is consistent with the nature of polytopes of stochastic matrices, which are Cartesian products of probability simplices. The first approach yields a scaled product of Fisher metrics; the second, a product of Fisher metrics; and the third, a product of Fisher metrics scaled by the marginal distribution. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

Graphical abstract

29 pages, 611 KiB

Open AccessArticle

Information-Geometric Markov Chain Monte Carlo Methods Using Diffusions

by Samuel Livingstone and Mark Girolami

Entropy 2014, 16(6), 3074-3102; https://doi.org/10.3390/e16063074 - 3 Jun 2014

Cited by 31 | Viewed by 9141

Abstract

Recent work incorporating geometric ideas in Markov chain Monte Carlo is reviewed in order to highlight these advances and their possible application in a range of domains beyond statistics. A full exposition of Markov chains and their use in Monte Carlo simulation for [...] Read more.

Recent work incorporating geometric ideas in Markov chain Monte Carlo is reviewed in order to highlight these advances and their possible application in a range of domains beyond statistics. A full exposition of Markov chains and their use in Monte Carlo simulation for statistical inference and molecular dynamics is provided, with particular emphasis on methods based on Langevin diffusions. After this, geometric concepts in Markov chain Monte Carlo are introduced. A full derivation of the Langevin diffusion on a Riemannian manifold is given, together with a discussion of the appropriate Riemannian metric choice for different problems. A survey of applications is provided, and some open questions are discussed. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

23 pages, 293 KiB

Open AccessArticle

Asymptotically Constant-Risk Predictive Densities When the Distributions of Data and Target Variables Are Different

by Keisuke Yano and Fumiyasu Komaki

Entropy 2014, 16(6), 3026-3048; https://doi.org/10.3390/e16063026 - 28 May 2014

Viewed by 6307

Abstract

We investigate the asymptotic construction of constant-risk Bayesian predictive densities under the Kullback–Leibler risk when the distributions of data and target variables are different and have a common unknown parameter. It is known that the Kullback–Leibler risk is asymptotically equal to a trace [...] Read more.

We investigate the asymptotic construction of constant-risk Bayesian predictive densities under the Kullback–Leibler risk when the distributions of data and target variables are different and have a common unknown parameter. It is known that the Kullback–Leibler risk is asymptotically equal to a trace of the product of two matrices: the inverse of the Fisher information matrix for the data and the Fisher information matrix for the target variables. We assume that the trace has a unique maximum point with respect to the parameter. We construct asymptotically constant-risk Bayesian predictive densities using a prior depending on the sample size. Further, we apply the theory to the subminimax estimator problem and the prediction based on the binary regression model. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

15 pages, 237 KiB

Open AccessArticle

Information Geometric Complexity of a Trivariate Gaussian Statistical Model

by Domenico Felice, Carlo Cafaro and Stefano Mancini

Entropy 2014, 16(6), 2944-2958; https://doi.org/10.3390/e16062944 - 26 May 2014

Cited by 8 | Viewed by 5583

Abstract

We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic [...] Read more.

We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that, for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

16 pages, 220 KiB

Open AccessArticle

F-Geometry and Amari’s α-Geometry on a Statistical Manifold

by Harsha K. V. and Subrahamanian Moosath K. S.

Entropy 2014, 16(5), 2472-2487; https://doi.org/10.3390/e16052472 - 6 May 2014

Cited by 9 | Viewed by 6542

Abstract

In this paper, we introduce a geometry called F-geometry on a statistical manifold S using an embedding F of S into the space R_X of random variables. Amari’s α-geometry is a special case of F-geometry. Then using the embedding [...] Read more.

In this paper, we introduce a geometry called F-geometry on a statistical manifold S using an embedding F of S into the space R_X of random variables. Amari’s α-geometry is a special case of F-geometry. Then using the embedding F and a positive smooth function G, we introduce (F,G)-metric and (F,G)-connections that enable one to consider weighted Fisher information metric and weighted connections. The necessary and sufficient condition for two (F,G)-connections to be dual with respect to the (F,G)-metric is obtained. Then we show that Amari’s 0-connection is the only self dual F-connection with respect to the Fisher information metric. Invariance properties of the geometric structures are discussed, which proved that Amari’s α-connections are the only F-connections that are invariant under smooth one-to-one transformations of the random variables. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

18 pages, 735 KiB

Open AccessArticle

Computational Information Geometry in Statistics: Theory and Practice

by Frank Critchley and Paul Marriott

Entropy 2014, 16(5), 2454-2471; https://doi.org/10.3390/e16052454 - 2 May 2014

Cited by 7 | Viewed by 8705

Abstract

A broad view of the nature and potential of computational information geometry in statistics is offered. This new area suitably extends the manifold-based approach of classical information geometry to a simplicial setting, in order to obtain an operational universal model space. Additional underlying [...] Read more.

A broad view of the nature and potential of computational information geometry in statistics is offered. This new area suitably extends the manifold-based approach of classical information geometry to a simplicial setting, in order to obtain an operational universal model space. Additional underlying theory and illustrative real examples are presented. In the inﬁnite-dimensional case, challenges inherent in this ambitious overall agenda are highlighted and promising new methodologies indicated. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

15 pages, 246 KiB

Open AccessArticle

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

by Shun-ichi Amari

Entropy 2014, 16(4), 2131-2145; https://doi.org/10.3390/e16042131 - 14 Apr 2014

Cited by 23 | Viewed by 7600

Abstract

Information geometry studies the dually flat structure of a manifold, highlighted by the generalized Pythagorean theorem. The present paper studies a class of Bregman divergences called the (ρ,τ)-divergence. A (ρ,τ) -divergence generates a dually flat structure in the manifold of positive measures, as [...] Read more.

Information geometry studies the dually flat structure of a manifold, highlighted by the generalized Pythagorean theorem. The present paper studies a class of Bregman divergences called the (ρ,τ)-divergence. A (ρ,τ) -divergence generates a dually flat structure in the manifold of positive measures, as well as in the manifold of positive-definite matrices. The class is composed of decomposable divergences, which are written as a sum of componentwise divergences. Conversely, a decomposable dually flat divergence is shown to be a (ρ,τ) -divergence. A (ρ,τ) -divergence is determined from two monotone scalar functions, ρ and τ. The class includes the KL-divergence, α-, β- and (α, β)-divergences as special cases. The transformation between an affine parameter and its dual is easily calculated in the case of a decomposable divergence. Therefore, such a divergence is useful for obtaining the center for a cluster of points, which will be applied to classification and information retrieval in vision. For the manifold of positive-definite matrices, in addition to the dually flatness and decomposability, we require the invariance under linear transformations, in particular under orthogonal transformations. This opens a way to define a new class of divergences, called the (ρ,τ) -structure in the manifold of positive-definite matrices. Full article

(This article belongs to the Special Issue Information Geometry)

33 pages, 365 KiB

Open AccessArticle

Matrix Algebraic Properties of the Fisher Information Matrix of Stationary Processes

by André Klein

Entropy 2014, 16(4), 2023-2055; https://doi.org/10.3390/e16042023 - 8 Apr 2014

Cited by 5 | Viewed by 7276

Abstract

In this survey paper, a summary of results which are to be found in a series of papers, is presented. The subject of interest is focused on matrix algebraic properties of the Fisher information matrix (FIM) of stationary processes. The FIM is an [...] Read more.

In this survey paper, a summary of results which are to be found in a series of papers, is presented. The subject of interest is focused on matrix algebraic properties of the Fisher information matrix (FIM) of stationary processes. The FIM is an ingredient of the Cram´er-Rao inequality, and belongs to the basics of asymptotic estimation theory in mathematical statistics. The FIM is interconnected with the Sylvester, Bezout and tensor Sylvester matrices. Through these interconnections it is shown that the FIM of scalar and multiple stationary processes fulfill the resultant matrix property. A statistical distance measure involving entries of the FIM is presented. In quantum information, a different statistical distance measure is set forth. It is related to the Fisher information but where the information about one parameter in a particular measurement procedure is considered. The FIM of scalar stationary processes is also interconnected to the solutions of appropriate Stein equations, conditions for the FIM to verify certain Stein equations are formulated. The presence of Vandermonde matrices is also emphasized. Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

35 pages, 6192 KiB

Open AccessArticle

Learning from Complex Systems: On the Roles of Entropy and Fisher Information in Pairwise Isotropic Gaussian Markov Random Fields

by Alexandre Levada

Entropy 2014, 16(2), 1002-1036; https://doi.org/10.3390/e16021002 - 18 Feb 2014

Cited by 10 | Viewed by 7656

Abstract

Markov random field models are powerful tools for the study of complex systems. However, little is known about how the interactions between the elements of such systems are encoded, especially from an information-theoretic perspective. In this paper, our goal is to enlighten the [...] Read more.

Markov random field models are powerful tools for the study of complex systems. However, little is known about how the interactions between the elements of such systems are encoded, especially from an information-theoretic perspective. In this paper, our goal is to enlighten the connection between Fisher information, Shannon entropy, information geometry and the behavior of complex systems modeled by isotropic pairwise Gaussian Markov random fields. We propose analytical expressions to compute local and global versions of these measures using Besag’s pseudo-likelihood function, characterizing the system’s behavior through its Fisher curve , a parametric trajectory across the information space that provides a geometric representation for the study of complex systems in which temperature deviates from infinity. Computational experiments show how the proposed tools can be useful in extracting relevant information from complex patterns. The obtained results quantify and support our main conclusion, which is: in terms of information, moving towards higher entropy states (A –> B) is different from moving towards lower entropy states (B –> A), since the Fisher curves are not the same, given a natural orientation (the direction of time). Full article

(This article belongs to the Special Issue Information Geometry)

► Show Figures

Journal Menu

Journal Browser

Information Geometry

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (17 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI