Display options:
Normal
Show Abstracts
Compact
Select/unselect all
Displaying article 1-18
p. 5673-5694
Received: 11 May 2015 / Revised: 31 July 2015 / Accepted: 3 August 2015 / Published: 6 August 2015
Show/Hide Abstract
| PDF Full-text (510 KB) | HTML Full-text | XML Full-text
Abstract: In this paper, we investigate the basic properties of binary classification with a pseudo model based on the Itakura–Saito distance and reveal that the Itakura–Saito distance is a unique appropriate measure for estimation with the pseudo model in the framework of general Bregman divergence. Furthermore, we propose a novelmulti-task learning algorithm based on the pseudo model in the framework of the ensemble learning method. We focus on a specific setting of the multi-task learning for binary classification problems. The set of features is assumed to be common among all tasks, which are our targets of performance improvement. We consider a situation where the shared structures among the dataset are represented by divergence between underlying distributions associated with multiple tasks. We discuss statistical properties of the proposed method and investigate the validity of the proposed method with numerical experiments.
p. 4602-4626
Received: 3 April 2015 / Revised: 20 June 2015 / Accepted: 25 June 2015 / Published: 1 July 2015
Show/Hide Abstract
| Cited by 1 | PDF Full-text (2403 KB) | HTML Full-text | XML Full-text
Abstract: In regression analysis for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. In many situations, the assumptions underlying OLS are not fulfilled, and several other approaches have been proposed. However, most techniques address only part of the shortcomings of OLS. We here discuss a new and more general regression method, which we call geodesic least squares regression (GLS). The method is based on minimization of the Rao geodesic distance on a probabilistic manifold. For the case of a power law, we demonstrate the robustness of the method on synthetic data in the presence of significant uncertainty on both the data and the regression model. We then show good performance of the method in an application to a scaling law in magnetic confinement fusion.
p. 4485-4499
Received: 27 January 2015 / Revised: 28 February 2015 / Accepted: 17 March 2015 / Published: 25 June 2015
Show/Hide Abstract
| Cited by 3 | PDF Full-text (273 KB) | HTML Full-text | XML Full-text
Abstract: A paper was published (Harsha and Subrahamanian Moosath, 2014) in which the authors claimed to have discovered an extension to Amari's \(\alpha\)-geometry through a general monotone embedding function. It will be pointed out here that this so-called \((F, G)\)-geometry (which includes \(F\)-geometry as a special case) is identical to Zhang's (2004) extension to the \(\alpha\)-geometry, where the name of the pair of monotone embedding functions \(\rho\) and \(\tau\) were used instead of \(F\) and \(H\) used in Harsha and Subrahamanian Moosath (2014). Their weighting function \(G\) for the Riemannian metric appears cosmetically due to a rewrite of the score function in log-representation as opposed to \((\rho, \tau)\)-representation in Zhang (2004). It is further shown here that the resulting metric and \(\alpha\)-connections obtained by Zhang (2004) through arbitrary monotone embeddings is a unique extension of the \(\alpha\)-geometric structure. As a special case, Naudts' (2004) \(\phi\)-logarithm embedding (using the so-called \(\log_\phi\) function) is recovered with the identification \(\rho=\phi, \, \tau=\log_\phi\), with \(\phi\)-exponential \(\exp_\phi\) given by the associated convex function linking the two representations.
p. 4215-4254
Received: 31 January 2015 / Revised: 21 May 2015 / Accepted: 2 June 2015 / Published: 18 June 2015
Show/Hide Abstract
| PDF Full-text (1656 KB) | HTML Full-text | XML Full-text
Abstract: In this paper, we study Amari’s natural gradient flows of real functions defined on the densities belonging to an exponential family on a finite sample space. Our main example is the minimization of the expected value of a real function defined on the sample space. In such a case, the natural gradient flow converges to densities with reduced support that belong to the border of the exponential family. We have suggested in previous works to use the natural gradient evaluated in the mixture geometry. Here, we show that in some cases, the differential equation can be extended to a bigger domain in such a way that the densities at the border of the exponential family are actually internal points in the extended problem. The extension is based on the algebraic concept of an exponential variety. We study in full detail a toy example and obtain positive partial results in the important case of a binary sample space.
p. 3989-4027
Received: 20 November 2014 / Revised: 4 May 2015 / Accepted: 5 May 2015 / Published: 12 June 2015
Show/Hide Abstract
| Cited by 1 | PDF Full-text (347 KB) | HTML Full-text | XML Full-text
Abstract: The main content of this review article is first to review the main inference tools using Bayes rule, the maximum entropy principle (MEP), information theory, relative entropy and the Kullback–Leibler (KL) divergence, Fisher information and its corresponding geometries. For each of these tools, the precise context of their use is described. The second part of the paper is focused on the ways these tools have been used in data, signal and image processing and in the inverse problems, which arise in different physical sciences and engineering applications. A few examples of the applications are described: entropy in independent components analysis (ICA) and in blind source separation, Fisher information in data model selection, different maximum entropy-based methods in time series spectral estimation and in linear inverse problems and, finally, the Bayesian inference for general inverse problems. Some original materials concerning the approximate Bayesian computation (ABC) and, in particular, the variational Bayesian approximation (VBA) methods are also presented. VBA is used for proposing an alternative Bayesian computational tool to the classical Markov chain Monte Carlo (MCMC) methods. We will also see that VBA englobes joint maximum a posteriori (MAP), as well as the different expectation-maximization (EM) algorithms as particular cases.
p. 3963-3988
Received: 31 January 2015 / Revised: 13 May 2015 / Accepted: 4 June 2015 / Published: 11 June 2015
Show/Hide Abstract
| PDF Full-text (3901 KB) | HTML Full-text | XML Full-text
Abstract: The paper proposes a new non-parametric density estimator from region-censored observations with application in the context of population studies, where standard maximum likelihood is affected by over-fitting and non-uniqueness problems. It is a maximum entropy estimator that satisfies a set of constraints imposing a close fit to the empirical distributions associated with the set of censoring regions. The degree of relaxation of the data-fit constraints is chosen, such that the likelihood of the inferred model is maximal. In this manner, the estimator is able to overcome the singularity of the non-parametric maximum likelihood estimator and, at the same time, maintains a good fit to the observations. The behavior of the estimator is studied in a simulation, demonstrating its superior performance with respect to the non-parametric maximum likelihood and the importance of carefully choosing the degree of relaxation of the data-fit constraints. In particular, the predictive performance of the resulting estimator is better, which is important when the population analysis is done in the context of risk assessment. We also apply the estimator to real data in the context of the prevention of hyperbaric decompression sickness, where the available observations are formally equivalent to region-censored versions of the variables of interest, confirming that it is a superior alternative to non-parametric maximum likelihood in realistic situations.
p. 3898-3912
Received: 30 March 2015 / Revised: 1 June 2015 / Accepted: 2 June 2015 / Published: 10 June 2015
Show/Hide Abstract
| Cited by 1 | PDF Full-text (604 KB) | HTML Full-text | XML Full-text
Abstract: Based on geometric invariance properties, we derive an explicit prior distribution for the parameters of multivariate linear regression problems in the absence of further prior information. The problem is formulated as a rotationally-invariant distribution of \(L\)-dimensional hyperplanes in \(N\) dimensions, and the associated system of partial differential equations is solved. The derived prior distribution generalizes the already known special cases, e.g., 2D plane in three dimensions.
p. 3253-3318
Received: 31 January 2015 / Revised: 3 May 2015 / Accepted: 5 May 2015 / Published: 13 May 2015
Show/Hide Abstract
| PDF Full-text (510 KB) | HTML Full-text | XML Full-text
Abstract: We propose that entropy is a universal co-homological class in a theory associated to a family of observable quantities and a family of probability distributions. Three cases are presented: (1) classical probabilities and random variables; (2) quantum probabilities and observable operators; (3) dynamic probabilities and observation trees. This gives rise to a new kind of topology for information processes, that accounts for the main information functions: entropy, mutual-informations at all orders, and Kullback–Leibler divergence and generalizes them in several ways. The article is divided into two parts, that can be read independently. In the first part, the introduction, we provide an overview of the results, some open questions, future results and lines of research, and discuss briefly the application to complex data. In the second part we give the complete definitions and proofs of the theorems A, C and E in the introduction, which show why entropy is the first homological invariant of a structure of information in four contexts: static classical or quantum probability, dynamics of classical or quantum strategies of observation of a finite system.
p. 1850-1881
Received: 31 January 2015 / Revised: 19 March 2015 / Accepted: 20 March 2015 / Published: 31 March 2015
Show/Hide Abstract
| PDF Full-text (333 KB) | HTML Full-text | XML Full-text
Abstract: In computational anatomy, organ’s shapes are often modeled as deformations of a reference shape, i.e., as elements of a Lie group. To analyze the variability of the human anatomy in this framework, we need to perform statistics on Lie groups. A Lie group is a manifold with a consistent group structure. Statistics on Riemannian manifolds have been well studied, but to use the statistical Riemannian framework on Lie groups, one needs to define a Riemannian metric compatible with the group structure: a bi-invariant metric. However, it is known that Lie groups, which are not a direct product of compact and abelian groups, have no bi-invariant metric. However, what about bi-invariant pseudo-metrics? In other words: could we remove the assumption of the positivity of the metric and obtain consistent statistics on Lie groups through the pseudo-Riemannian framework? Our contribution is two-fold. First, we present an algorithm that constructs bi-invariant pseudo-metrics on a given Lie group, in the case of existence. Then, by running the algorithm on commonly-used Lie groups, we show that most of them do not admit any bi-invariant (pseudo-) metric. We thus conclude that the (pseudo-) Riemannian setting is too limited for the definition of consistent statistics on general Lie groups.
p. 1814-1849
Received: 28 January 2015 / Revised: 13 March 2015 / Accepted: 13 March 2015 / Published: 30 March 2015
Show/Hide Abstract
| PDF Full-text (373 KB) | HTML Full-text | XML Full-text
Abstract: Geometry of Fisher metric and geodesics on a space of probability measures defined on a compact manifold is discussed and is applied to geometry of a barycenter map associated with Busemann function on an Hadamard manifold \(X\). We obtain an explicit formula of geodesic and then several theorems on geodesics, one of which asserts that any two probability measures can be joined by a unique geodesic. Using Fisher metric and thus obtained properties of geodesics, a fibre space structure of barycenter map and geodesical properties of each fibre are discussed. Moreover, an isometry problem on an Hadamard manifold \(X\) and its ideal boundary \(\partial X\)—for a given homeomorphism \(\Phi\) of \(\partial X\) find an isometry of \(X\) whose \(\partial X\)-extension coincides with \(\Phi\)—is investigated in terms of the barycenter map.
p. 1581-1605
Received: 16 January 2015 / Revised: 13 March 2015 / Accepted: 20 March 2015 / Published: 25 March 2015
Show/Hide Abstract
| PDF Full-text (265 KB) | HTML Full-text | XML Full-text
Abstract: We prove the correspondence between the information geometry of a signal filter and a Kähler manifold. The information geometry of a minimum-phase linear system with a finite complex cepstrum norm is a Kähler manifold. The square of the complex cepstrum norm of the signal filter corresponds to the Kähler potential. The Hermitian structure of the Kähler manifold is explicitly emergent if and only if the impulse response function of the highest degree in z is constant in model parameters. The Kählerian information geometry takes advantage of more efficient calculation steps for the metric tensor and the Ricci tensor. Moreover, α-generalization on the geometric tensors is linear in α . It is also robust to find Bayesian predictive priors, such as superharmonic priors, because Laplace–Beltrami operators on Kähler manifolds are in much simpler forms than those of the non-Kähler manifolds. Several time series models are studied in the Kählerian information geometry.
p. 1347-1357
Received: 16 January 2015 / Revised: 11 March 2015 / Accepted: 12 March 2015 / Published: 17 March 2015
Show/Hide Abstract
| PDF Full-text (201 KB) | HTML Full-text | XML Full-text
Abstract: We construct geometric shrinkage priors for Kählerian signal filters. Based on the characteristics of Kähler manifolds, an efficient and robust algorithm for finding superharmonic priors which outperform the Jeffreys prior is introduced. Several ansätze for the Bayesian predictive priors are also suggested. In particular, the ansätze related to Kähler potential are geometrically intrinsic priors to the information manifold of which the geometry is derived from the potential. The implication of the algorithm to time series models is also provided.
p. 1273-1277
Received: 2 February 2015 / Revised: 9 March 2015 / Accepted: 10 March 2015 / Published: 13 March 2015
Show/Hide Abstract
| PDF Full-text (193 KB) | HTML Full-text | XML Full-text
Abstract: In this discussion, we indicate possibilities for (homological and non-homological) linearization of basic notions of the probability theory and also for replacing the real numbers as values of probabilities by objects of suitable combinatorial categories.
p. 1165-1180
Received: 30 December 2014 / Revised: 5 March 2015 / Accepted: 9 March 2015 / Published: 12 March 2015
Show/Hide Abstract
| PDF Full-text (311 KB) | HTML Full-text | XML Full-text
Abstract: We present an application of distributed consensus algorithms to metamorphic systems. A metamorphic system is a set of identical units that can self-assemble to form a rigid structure. For instance, one can think of a robotic arm composed of multiple links connected by joints. The system can change its shape in order to adapt to different environments via reconfiguration of its constituting units. We assume in this work that several metamorphic systems form a network: two systems are connected whenever they are able to communicate with each other. The aim of this paper is to propose a distributed algorithm that synchronizes all of the systems in the network. Synchronizing means that all of the systems should end up having the same configuration. This aim is achieved in two steps: (i) we cast the problem as a consensus problem on a metric space; and (ii) we use a recent distributed consensus algorithm that only makes use of metrical notions.
p. 304-345
Received: 8 October 2014 / Accepted: 7 January 2015 / Published: 13 January 2015
Show/Hide Abstract
| Cited by 2 | PDF Full-text (544 KB) | HTML Full-text | XML Full-text
Abstract: Information geometric optimization (IGO) is a general framework for stochastic optimization problems aiming at limiting the influence of arbitrary parametrization choices: the initial problem is transformed into the optimization of a smooth function on a Riemannian manifold, defining a parametrization-invariant first order differential equation and, thus, yielding an approximately parametrization-invariant algorithm (up to second order in the step size). We define the geodesic IGO update, a fully parametrization-invariant algorithm using the Riemannian structure, and we compute it for the manifold of Gaussians, thanks to Noether’s theorem. However, in similar algorithms, such as CMA-ES (Covariance Matrix Adaptation - Evolution Strategy) and xNES (exponential Natural Evolution Strategy), the time steps for the mean and the covariance are decoupled. We suggest two ways of doing so: twisted geodesic IGO (GIGO) and blockwise GIGO. Finally, we show that while the xNES algorithm is not GIGO, it is an instance of blockwise GIGO applied to the mean and covariance matrix separately. Therefore, xNES has an almost parametrization-invariant description.
p. 5876-5890
Received: 21 August 2014 / Revised: 28 October 2014 / Accepted: 4 November 2014 / Published: 6 November 2014
Show/Hide Abstract
| PDF Full-text (722 KB) | HTML Full-text | XML Full-text
Abstract: In the last decades of the nineteenth century, different attitudes towards mechanics led to two main theoretical approaches to thermodynamics: an abstract and phenomenological approach, and a very different approach in terms of microscopic models. In reality some intermediate solutions were also put forward. Helmholtz and Planck relied on a mere complementarity between mechanical and thermal variables in the expressions of state functions, and Oettingen explored the possibility of a more demanding symmetry between mechanical and thermal capacities. Planck refused microscopic interpretations of heat, whereas Helmholtz made also recourse to a Lagrangian approach involving fast hidden motions. J.J. Thomson incorporated the two mechanical attitudes in his theoretical framework, and put forward a very general theory for physical and chemical processes. He made use of two sets of Lagrangian coordinates that corresponded to two components of kinetic energy: alongside macroscopic energy, there was a microscopic energy, which was associated with the absolute temperature. Duhem put forward a bold design of unification between physics and chemistry, which was based on the two principles of thermodynamics. From the mathematical point of view, his thermodynamics or energetics consisted of a Lagrangian generalization of mechanics that could potentially describe every kind of irreversible process, explosive chemical reactions included.
p. 4892-4910
Received: 23 July 2014 / Revised: 18 August 2014 / Accepted: 28 August 2014 / Published: 10 September 2014
Show/Hide Abstract
| PDF Full-text (416 KB) | HTML Full-text | XML Full-text
Abstract: In the information theory community, the following “historical” statements are generally well accepted: (1) Hartley did put forth his rule twenty years before Shannon; (2) Shannon’s formula as a fundamental tradeoff between transmission rate, bandwidth, and signal-to-noise ratio came out unexpected in 1948; (3) Hartley’s rule is inexact while Shannon’s formula is characteristic of the additive white Gaussian noise channel; (4) Hartley’s rule is an imprecise relation that is not an appropriate formula for the capacity of a communication channel. We show that all these four statements are somewhat wrong. In fact, a careful calculation shows that “Hartley’s rule” in fact coincides with Shannon’s formula. We explain this mathematical coincidence by deriving the necessary and sufficient conditions on an additive noise channel such that its capacity is given by Shannon’s formula and construct a sequence of such channels that makes the link between the uniform (Hartley) and Gaussian (Shannon) channels.
p. 4521-4565
Received: 30 March 2014 / Revised: 11 June 2014 / Accepted: 23 June 2014 / Published: 12 August 2014
Show/Hide Abstract
| Cited by 9 | PDF Full-text (2994 KB) | HTML Full-text | XML Full-text
Abstract: The François Massieu 1869 idea to derive some mechanical and thermal properties of physical systems from “Characteristic Functions”, was developed by Gibbs and Duhem in thermodynamics with the concept of potentials, and introduced by Poincaré in probability. This paper deals with generalization of this Characteristic Function concept by Jean-Louis Koszul in Mathematics and by Jean-Marie Souriau in Statistical Physics. The Koszul-Vinberg Characteristic Function (KVCF) on convex cones will be presented as cornerstone of “Information Geometry” theory, defining Koszul Entropy as Legendre transform of minus the logarithm of KVCF, and Fisher Information Metrics as hessian of these dual functions, invariant by their automorphisms. In parallel, Souriau has extended the Characteristic Function in Statistical Physics looking for other kinds of invariances through co-adjoint action of a group on its momentum space, defining physical observables like energy, heat and momentum as pure geometrical objects. In covariant Souriau model, Gibbs equilibriums states are indexed by a geometric parameter, the Geometric (Planck) Temperature, with values in the Lie algebra of the dynamical Galileo/Poincaré groups, interpreted as a space-time vector, giving to the metric tensor a null Lie derivative. Fisher Information metric appears as the opposite of the derivative of Mean “Moment map” by geometric temperature, equivalent to a Geometric Capacity or Specific Heat. We will synthetize the analogies between both Koszul and Souriau models, and will reduce their definitions to the exclusive Cartan “Inner Product”. Interpreting Legendre transform as Fourier transform in (Min ,+ ) algebra, we conclude with a definition of Entropy given by a relation mixing Fourier/Laplace transforms: Entropy = (minus) Fourier _{(Min,+)} o Log o Laplace _{(+,X)} .
Select/unselect all
Displaying article 1-18
Export citation of selected articles as:
Plain Text
BibTeX
BibTeX (without abstracts)
Endnote
Endnote (without abstracts)
Tab-delimited
RIS