Display options:
Normal
Show Abstracts
Compact
Select/unselect all
Displaying article 1-17
p. 4260-4289
Received: 31 March 2014 / Revised: 10 July 2014 / Accepted: 11 July 2014 / Published: 28 July 2014
Show/Hide Abstract
| Cited by 6 | PDF Full-text (4167 KB) | HTML Full-text | XML Full-text
Abstract: We discuss the use of the Newton method in the computation of max(p → Ε_{p } [f]), where p belongs to a statistical exponential family on a finite state space. In a number of papers, the authors have applied first order search methods based on information geometry. Second order methods have been widely used in optimization on manifolds, e.g., matrix manifolds, but appear to be new in statistical manifolds. These methods require the computation of the Riemannian Hessian in a statistical manifold. We use a non-parametric formulation of information geometry in view of further applications in the continuous state space cases, where the construction of a proper Riemannian structure is still an open problem.
p. 4132-4167
Received: 28 March 2014 / Revised: 24 June 2014 / Accepted: 14 July 2014 / Published: 23 July 2014
Show/Hide Abstract
| Cited by 1 | PDF Full-text (1740 KB) | HTML Full-text | XML Full-text
Abstract: We consider the graph representation of the stochastic model with n binary variables, and develop an information theoretical framework to measure the degree of statistical association existing between subsystems as well as the ones represented by each edge of the graph representation. Besides, we consider the novel measures of complexity with respect to the system decompositionability, by introducing the geometric product of Kullback–Leibler (KL-) divergence. The novel complexity measures satisfy the boundary condition of vanishing at the limit of completely random and ordered state, and also with the existence of independent subsystem of any size. Such complexity measures based on the geometric means are relevant to the heterogeneity of dependencies between subsystems, and the amount of information propagation shared entirely in the system.
p. 4088-4100
Received: 30 April 2014 / Revised: 30 June 2014 / Accepted: 14 July 2014 / Published: 18 July 2014
Show/Hide Abstract
| PDF Full-text (262 KB) | HTML Full-text | XML Full-text
Abstract: One dimensional exponential families on finite sample spaces are studied using the geometry of the simplex Δ_{n} °-1 and that of a transformation V_{n} -1 of its interior. This transformation is the natural parameter space associated with the family of multinomial distributions. The space V_{n} -1 is partitioned into cones that are used to find one dimensional families with desirable properties for modeling and inference. These properties include the availability of uniformly most powerful tests and estimators that exhibit optimal properties in terms of variability and unbiasedness.
p. 4015-4031
Received: 17 April 2014 / Revised: 23 June 2014 / Accepted: 9 July 2014 / Published: 17 July 2014
Show/Hide Abstract
| Cited by 3 | PDF Full-text (2418 KB) | HTML Full-text | XML Full-text
Abstract: The current paper introduces new prior distributions on the univariate normal model, with the aim of applying them to the classification of univariate normal populations. These new prior distributions are entirely based on the Riemannian geometry of the univariate normal model, so that they can be thought of as “Riemannian priors”. Precisely, if { p θ ; θ ∈ Θ} is any parametrization of the univariate normal model, the paper considers prior distributions G ($\stackrel{-}{\theta}$ , γ ) with hyperparameters $\stackrel{-}{\theta}$ ∈ Θ and γ > 0, whose density with respect to Riemannian volume is proportional to exp(− d 2(θ, $\stackrel{-}{\theta}$ )/ 2γ ^{2} ), where d^{2} (θ, $\stackrel{-}{\theta}$ ) is the square of Rao’s Riemannian distance. The distributions G($\stackrel{-}{\theta}$ , γ) are termed Gaussian distributions on the univariate normal model. The motivation for considering a distribution G($\stackrel{-}{\theta}$ , γ) is that this distribution gives a geometric representation of a class or cluster of univariate normal populations. Indeed, G($\stackrel{-}{\theta}$ , γ) has a unique mode $\stackrel{-}{\theta}$ (precisely, $\stackrel{-}{\theta}$ is the unique Riemannian center of mass of G($\stackrel{-}{\theta}$ , γ), as shown in the paper), and its dispersion away from $\stackrel{-}{\theta}$ is given by γ. Therefore, one thinks of members of the class represented by G($\stackrel{-}{\theta}$ , γ) as being centered around $\stackrel{-}{\theta}$ and lying within a typical distance determined by γ. The paper defines rigorously the Gaussian distributions G($\stackrel{-}{\theta}$ , γ) and describes an algorithm for computing maximum likelihood estimates of their hyperparameters. Based on this algorithm and on the Laplace approximation, it describes how the distributions G($\stackrel{-}{\theta}$ , γ) can be used as prior distributions for Bayesian classification of large univariate normal populations. In a concrete application to texture image classification, it is shown that this leads to an improvement in performance over the use of conjugate priors.
p. 3878-3888
Received: 15 May 2014 / Revised: 25 June 2014 / Accepted: 11 July 2014 / Published: 15 July 2014
Show/Hide Abstract
| Cited by 2 | PDF Full-text (215 KB) | HTML Full-text | XML Full-text
Abstract: The von Neumann entropy S($\hat{D}$ ) generates in the space of quantum density matrices $\hat{D}$ the Riemannian metric ds^{2} = −d^{2} S($\hat{D}$ ) , which is physically founded and which characterises the amount of quantum information lost by mixing $\hat{D}$ and $\hat{D}$ + d$\hat{D}$ . A rich geometric structure is thereby implemented in quantum mechanics. It includes a canonical mapping between the spaces of states and of observables, which involves the Legendre transform of S($\hat{D}$ ) . The Kubo scalar product is recovered within the space of observables. Applications are given to equilibrium and non equilibrium quantum statistical mechanics. There the formalism is specialised to the relevant space of observables and to the associated reduced states issued from the maximum entropy criterion, which result from the exact states through an orthogonal projection. Von Neumann’s entropy specialises into a relevant entropy. Comparison is made with other metrics. The Riemannian properties of the metric ds^{2} = −d^{2} S($\hat{D}$ ) are derived. The curvature arises from the non-Abelian nature of quantum mechanics; its general expression and its explicit form for q-bits are given, as well as geodesics.
p. 3832-3847
Received: 14 April 2014 / Revised: 12 June 2014 / Accepted: 7 July 2014 / Published: 14 July 2014
Show/Hide Abstract
| PDF Full-text (743 KB) | HTML Full-text | XML Full-text
Abstract: The power of projection using divergence functions is a major theme in information geometry. One version of this is the variational Bayes (VB) method. This paper looks at VB in the context of other projection-based methods in information geometry. It also describes how to apply VB to the regime-switching log-normal model and how it provides a computationally fast solution to quantify the uncertainty in the model specification. The results show that the method can recover exactly the model structure, gives the reasonable point estimates and is very computationally efficient. The potential problems of the method in quantifying the parameter uncertainty are discussed.
p. 3670-3688
Received: 25 March 2014 / Revised: 6 June 2014 / Accepted: 20 June 2014 / Published: 1 July 2014
Show/Hide Abstract
| PDF Full-text (397 KB) | HTML Full-text | XML Full-text
Abstract: The principle of extreme physical information (EPI) can be used to derive many known laws and distributions in theoretical physics by extremizing the physical information loss K, i.e., the difference between the observed Fisher information I and the intrinsic information bound J of the physical phenomenon being measured. However, for complex cognitive systems of high dimensionality (e.g., human language processing and image recognition), the information bound J could be excessively larger than I (J ≫ I), due to insufficient observation, which would lead to serious over-fitting problems in the derivation of cognitive models. Moreover, there is a lack of an established exact invariance principle that gives rise to the bound information in universal cognitive systems. This limits the direct application of EPI. To narrow down the gap between I and J, in this paper, we propose a confident-information-first (CIF) principle to lower the information bound J by preserving confident parameters and ruling out unreliable or noisy parameters in the probability density function being measured. The confidence of each parameter can be assessed by its contribution to the expected Fisher information distance between the physical phenomenon and its observations. In addition, given a specific parametric representation, this contribution can often be directly assessed by the Fisher information, which establishes a connection with the inverse variance of any unbiased estimate for the parameter via the Cramér–Rao bound. We then consider the dimensionality reduction in the parameter spaces of binary multivariate distributions. We show that the single-layer Boltzmann machine without hidden units (SBM) can be derived using the CIF principle. An illustrative experiment is conducted to show how the CIF principle improves the density estimation performance.
p. 3273-3301
Received: 15 May 2014 / Revised: 10 June 2014 / Accepted: 13 June 2014 / Published: 17 June 2014
Show/Hide Abstract
| Cited by 1 | PDF Full-text (354 KB) | HTML Full-text | XML Full-text
Abstract: Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the α -divergences using the concept of mixed divergences. First, we present a novel extension of k -means clustering to mixed divergences. Second, we extend the k- means++ seeding to mixed α-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed α-divergences.
p. 3207-3233
Received: 31 March 2014 / Revised: 18 May 2014 / Accepted: 29 May 2014 / Published: 6 June 2014
Show/Hide Abstract
| PDF Full-text (354 KB) | HTML Full-text | XML Full-text
Abstract: We consider three different approaches to define natural Riemannian metrics on polytopes of stochastic matrices. First, we define a natural class of stochastic maps between these polytopes and give a metric characterization of Chentsov type in terms of invariance with respect to these maps. Second, we consider the Fisher metric defined on arbitrary polytopes through their embeddings as exponential families in the probability simplex. We show that these metrics can also be characterized by an invariance principle with respect to morphisms of exponential families. Third, we consider the Fisher metric resulting from embedding the polytope of stochastic matrices in a simplex of joint distributions by specifying a marginal distribution. All three approaches result in slight variations of products of Fisher metrics. This is consistent with the nature of polytopes of stochastic matrices, which are Cartesian products of probability simplices. The first approach yields a scaled product of Fisher metrics; the second, a product of Fisher metrics; and the third, a product of Fisher metrics scaled by the marginal distribution.
p. 3074-3102
Received: 29 March 2014 / Revised: 23 May 2014 / Accepted: 28 May 2014 / Published: 3 June 2014
Show/Hide Abstract
| Cited by 5 | PDF Full-text (611 KB) | HTML Full-text | XML Full-text
Abstract: Recent work incorporating geometric ideas in Markov chain Monte Carlo is reviewed in order to highlight these advances and their possible application in a range of domains beyond statistics. A full exposition of Markov chains and their use in Monte Carlo simulation for statistical inference and molecular dynamics is provided, with particular emphasis on methods based on Langevin diffusions. After this, geometric concepts in Markov chain Monte Carlo are introduced. A full derivation of the Langevin diffusion on a Riemannian manifold is given, together with a discussion of the appropriate Riemannian metric choice for different problems. A survey of applications is provided, and some open questions are discussed.
p. 3026-3048
Received: 28 March 2014 / Revised: 9 May 2014 / Accepted: 22 May 2014 / Published: 28 May 2014
Show/Hide Abstract
| PDF Full-text (293 KB) | HTML Full-text | XML Full-text
Abstract: We investigate the asymptotic construction of constant-risk Bayesian predictive densities under the Kullback–Leibler risk when the distributions of data and target variables are different and have a common unknown parameter. It is known that the Kullback–Leibler risk is asymptotically equal to a trace of the product of two matrices: the inverse of the Fisher information matrix for the data and the Fisher information matrix for the target variables. We assume that the trace has a unique maximum point with respect to the parameter. We construct asymptotically constant-risk Bayesian predictive densities using a prior depending on the sample size. Further, we apply the theory to the subminimax estimator problem and the prediction based on the binary regression model.
p. 2944-2958
Received: 1 April 2014 / Revised: 21 May 2014 / Accepted: 22 May 2014 / Published: 26 May 2014
Show/Hide Abstract
| Cited by 3 | PDF Full-text (237 KB) | HTML Full-text | XML Full-text
Abstract: We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that, for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics.
p. 2472-2487
Received: 13 December 2013 / Revised: 21 April 2014 / Accepted: 25 April 2014 / Published: 6 May 2014
Show/Hide Abstract
| Cited by 6 | PDF Full-text (220 KB) | HTML Full-text | XML Full-text
Abstract: In this paper, we introduce a geometry called F -geometry on a statistical manifold S using an embedding F of S into the space R_{X} of random variables. Amari’s α -geometry is a special case of F -geometry. Then using the embedding F and a positive smooth function G , we introduce (F,G )-metric and (F,G )-connections that enable one to consider weighted Fisher information metric and weighted connections. The necessary and sufficient condition for two (F,G )-connections to be dual with respect to the (F,G )-metric is obtained. Then we show that Amari’s 0-connection is the only self dual F -connection with respect to the Fisher information metric. Invariance properties of the geometric structures are discussed, which proved that Amari’s α -connections are the only F -connections that are invariant under smooth one-to-one transformations of the random variables.
p. 2454-2471
Received: 27 March 2014 / Revised: 25 April 2014 / Accepted: 29 April 2014 / Published: 2 May 2014
Show/Hide Abstract
| Cited by 2 | PDF Full-text (735 KB) | HTML Full-text | XML Full-text
Abstract: A broad view of the nature and potential of computational information geometry in statistics is offered. This new area suitably extends the manifold-based approach of classical information geometry to a simplicial setting, in order to obtain an operational universal model space. Additional underlying theory and illustrative real examples are presented. In the inﬁnite-dimensional case, challenges inherent in this ambitious overall agenda are highlighted and promising new methodologies indicated.
p. 2131-2145
Received: 14 February 2014 / Revised: 9 April 2014 / Accepted: 10 April 2014 / Published: 14 April 2014
Show/Hide Abstract
| Cited by 5 | PDF Full-text (246 KB) | HTML Full-text | XML Full-text
Abstract: Information geometry studies the dually flat structure of a manifold, highlighted by the generalized Pythagorean theorem. The present paper studies a class of Bregman divergences called the (ρ,τ)-divergence. A (ρ,τ) -divergence generates a dually flat structure in the manifold of positive measures, as well as in the manifold of positive-definite matrices. The class is composed of decomposable divergences, which are written as a sum of componentwise divergences. Conversely, a decomposable dually flat divergence is shown to be a (ρ,τ) -divergence. A (ρ,τ) -divergence is determined from two monotone scalar functions, ρ and τ. The class includes the KL-divergence, α-, β- and (α, β)-divergences as special cases. The transformation between an affine parameter and its dual is easily calculated in the case of a decomposable divergence. Therefore, such a divergence is useful for obtaining the center for a cluster of points, which will be applied to classification and information retrieval in vision. For the manifold of positive-definite matrices, in addition to the dually flatness and decomposability, we require the invariance under linear transformations, in particular under orthogonal transformations. This opens a way to define a new class of divergences, called the (ρ,τ) -structure in the manifold of positive-definite matrices.
p. 2023-2055
Received: 12 February 2014 / Revised: 11 March 2014 / Accepted: 24 March 2014 / Published: 8 April 2014
Show/Hide Abstract
| PDF Full-text (365 KB) | HTML Full-text | XML Full-text
Abstract: In this survey paper, a summary of results which are to be found in a series of papers, is presented. The subject of interest is focused on matrix algebraic properties of the Fisher information matrix (FIM) of stationary processes. The FIM is an ingredient of the Cram´er-Rao inequality, and belongs to the basics of asymptotic estimation theory in mathematical statistics. The FIM is interconnected with the Sylvester, Bezout and tensor Sylvester matrices. Through these interconnections it is shown that the FIM of scalar and multiple stationary processes fulfill the resultant matrix property. A statistical distance measure involving entries of the FIM is presented. In quantum information, a different statistical distance measure is set forth. It is related to the Fisher information but where the information about one parameter in a particular measurement procedure is considered. The FIM of scalar stationary processes is also interconnected to the solutions of appropriate Stein equations, conditions for the FIM to verify certain Stein equations are formulated. The presence of Vandermonde matrices is also emphasized.
p. 1002-1036
Received: 4 December 2013 / Accepted: 30 January 2014 / Published: 18 February 2014
Show/Hide Abstract
| Cited by 3 | PDF Full-text (6192 KB) | HTML Full-text | XML Full-text
Abstract: Markov random field models are powerful tools for the study of complex systems. However, little is known about how the interactions between the elements of such systems are encoded, especially from an information-theoretic perspective. In this paper, our goal is to enlighten the connection between Fisher information, Shannon entropy, information geometry and the behavior of complex systems modeled by isotropic pairwise Gaussian Markov random fields. We propose analytical expressions to compute local and global versions of these measures using Besag’s pseudo-likelihood function, characterizing the system’s behavior through its Fisher curve , a parametric trajectory across the information space that provides a geometric representation for the study of complex systems in which temperature deviates from infinity. Computational experiments show how the proposed tools can be useful in extracting relevant information from complex patterns. The obtained results quantify and support our main conclusion, which is: in terms of information, moving towards higher entropy states (A –> B) is different from moving towards lower entropy states (B –> A), since the Fisher curves are not the same, given a natural orientation (the direction of time).
Select/unselect all
Displaying article 1-17
Export citation of selected articles as:
Plain Text
BibTeX
BibTeX (without abstracts)
Endnote
Endnote (without abstracts)
Tab-delimited
RIS