Next Article in Journal
Bayesian Reconstruction through Adaptive Image Notion
Previous Article in Journal
Modelling Reliable Electrical Conductors for E-Textile Circuits on Polyimide Filaments

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

# Information Geometry Conflicts With Independence †

by
John Skilling
Maximum Entropy Data Consultants Ltd., Kenmare, Ireland
Presented at the 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Garching, Germany, 30 June–5 July 2019.
Proceedings 2019, 33(1), 20; https://doi.org/10.3390/proceedings2019033020
Published: 5 December 2019

## Abstract

:
Information Geometry conflicts with the independence that is required for science and for rational inference generally.
PACS:
02.50Cw; 02.70Rr

## 1. Introduction

Information Geometry [1] assigns a geometrical relationship between probability distributions, using the local curvature (Hessian) of the Kullback-Leibler formula
$H ( p ; q ) = ∑ i p ( i ) log p ( i ) q ( i )$
as the covariant geometrical metric tensor [2,3] between $q$ and $p$. On a n-dimensional manifold $p ( θ )$ specified by parameters $θ 1 , ⋯ , θ n$, this $n × n$ Riemannian metric g is
Geodesic lengths and invariant volumes V follow from $( d ℓ ) 2 = ∑ g j k d θ j d θ k$ and $d V = det g d n θ$.
Necessarily, lengths are symmetric $ℓ ( p , q ) = ℓ ( q , p )$ between source and destination, so cannot be isomorphic to H which is from-to asymmetric. Yet (1) is the only connection which preserves independence of separate distributions, $H ( x × p ; y × q ) = H ( x ; y ) + H ( p ; q )$. Specifically, when H is used to assign an optimal $p$ (meaning minimally distorted from $q$) under constraints, that “maximum entropy” selection also depends on separate optimisation $x$-from-$y$ unless H has the form (1) [4,5].
It follows that any imposed geometrical connection must introduce interference between supposedly separate distributions. That behaviour is incompatible with the practice of scientific inference, and is confirmed by a counter-example.

## 2. Counter-Example

Consider the 2-parameter family of probability distributions [6]
$p v w ( t ( mod 1 ) ) = f t − v w for v < t < v + w , f 1 + v − t 1 − w for v + w < t < v + 1 ,$
Parameters v (location) and w (width) lie between 0 and 1. The function f (Figure 1) is monotonically increasing so that it rises from $f ( 0 )$ at $t = v$ to $f ( 1 )$ at $t = v + w ( mod 1 )$ before falling back to $f ( 0 )$ at $t = v + 1 ( mod 1 )$. It is positive and normalised to $∫ 0 1 f ( u ) d u = 1$ so that the $p v w ( · )$’s can be probability distributions on the interval (0,1) — which could model growth and decay in a periodic system.

#### 2.1. Two Parameters v and w

The $2 × 2$ information-geometry metric evaluates to
$g v v g v w g w v g w w = 1 w ( 1 − w ) ∫ 0 1 1 u u u 2 f ′ ( u ) 2 f ( u ) d u = A B B C / w ( 1 − w )$
where $A , B , C$ are constants. The table shows their values for two example functions. The first is easy to integrate while the second has vanishing slope $f ′ ( 0 ) = f ′ ( 1 ) = 0$ at the joins (as in Figure 1).
 $f ( u )$ $e u / ( e − 1 )$ $( 8 + 6 u 2 − 4 u 3 ) / 9$ $A$ $e − 1 = 1.71828$ $11 6 log 2 + 5 6 log 5 − 15 ( arctan 5 √ 15 − arctan 1 √ 15 ) = 0.05945$ $B$ $1 = 1.00000$ $89 12 log 2 − 25 12 log 5 − 15 6 ( arctan 5 √ 15 − arctan 1 √ 15 ) − 4 3 = 0.02909$ $C$ $e − 2 = 0.71828$ $251 24 log 2 + 5 24 log 5 + 13 15 12 ( arctan 5 √ 15 − arctan 1 √ 15 ) − 31 3 = 0.01636$
The invariant volume element follows as
$d V = det g d v d w = A C − B 2 w ( 1 − w ) d v d w$
where, by construction, $A C − B 2 > 0$. The total invariant volume is infinite.
$V = ∫ 0 1 d v ∫ 0 1 d w det g = ∞$

#### 2.2. One Parameter w

If v had been fixed, $p$ would have been confined to a submanifold $p w ( · )$ parameterised by w alone. The information-geometry metric reduces to
$g w w = 1 w ( 1 − w ) ∫ 0 1 u 2 f ′ ( u ) 2 f ( u ) d u = C w ( 1 − w )$
The invariant volume element follows as
$d V = g w w d w = C w ( 1 − w ) 1 / 2 d w$
where, by construction, $C > 0$. The total invariant volume is finite.
$V = ∫ 0 1 g w w d w = π C 1 / 2$

#### 2.3. Comparison of One and Two Parameters

Both shape ((5) versus (8)) and integral ((6) versus (9)) over w differ qualitatively according to whether or not v is held fixed.
 Treatment of v influences invariant volumes over w [Geometry]
That is a mathematical fact of information geometry.

#### 2.4. Science

For scientific application, (3) defines a wraparound translation-invariant model in which v does not affect w.
 Treatment of v should not influence inference about w [Science]
That is a science requirement. Any observational consequence of information-geometry’s invariant volumes would be rejected by the informed scientist. If there were such consequence, then observation of width w could be used to infer something about location v, contrary to the intention of the formulation.

## 3. Conclusions

Information geometry is not science. It denies the independence of separate parameters even though such independence is a fundamental requirement of scientific inquiry. The assumption of a geometrical connection between distributions is unnecessary for science and it fails under test.
Information geometry is a self-consistent mathematical structure which (like any other piece of mathematics) may find specialised application within science, but is not fundamental to it. The only fundamental connection is the Kullback-Leibler, which is from-to asymmetric hence not geometric.

## Funding

This research received no external funding.

## Acknowledgments

This investigation has been refined by many conversations, in particular with Ariel Caticha.

## Conflicts of Interest

The author declares no conflict of interest.

## References

1. Amari, S. Differential-geometrical methods in statistics. In Lecture Notes in Statistics; Springer-Verlag: Berlin, Germany, 1985. [Google Scholar]
2. Fisher, R. A. Theory of statistical estimation. Proc. Camb. Philos. Soc. 1925, 122, 700–725. [Google Scholar] [CrossRef]
3. Rao, C.R. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–89. [Google Scholar]
4. Shannon, C.F. A Mathematical theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
5. Knuth, K.H.; Skilling, J. Foundations of Inference. Axioms 2012, 1, 38–73. [Google Scholar] [CrossRef]
6. Skilling, J. Critique of Information Geometry. AIP Conf. Proc. 2013, 1636, 24–29. [Google Scholar]
Figure 1. Does v affect w?
Figure 1. Does v affect w?

## Share and Cite

MDPI and ACS Style

Skilling, J. Information Geometry Conflicts With Independence. Proceedings 2019, 33, 20. https://doi.org/10.3390/proceedings2019033020

AMA Style

Skilling J. Information Geometry Conflicts With Independence. Proceedings. 2019; 33(1):20. https://doi.org/10.3390/proceedings2019033020

Chicago/Turabian Style

Skilling, John. 2019. "Information Geometry Conflicts With Independence" Proceedings 33, no. 1: 20. https://doi.org/10.3390/proceedings2019033020