Information-Geometric Models in Data Analysis and Physics

Bernal-Casas, D.; Oller, José M.

doi:10.3390/math13193114

Open AccessArticle

Information-Geometric Models in Data Analysis and Physics

by

D. Bernal-Casas

^1,*

and

José M. Oller

²

¹

Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08018 Barcelona, Spain

²

Department of Genetics, Microbiology and Statistics, Faculty of Biology, Universitat de Barcelona, 08028 Barcelona, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(19), 3114; https://doi.org/10.3390/math13193114

Submission received: 16 July 2025 / Revised: 16 September 2025 / Accepted: 18 September 2025 / Published: 29 September 2025

(This article belongs to the Special Issue Information-Induced Geometries in Statistics, Data Analysis, and Applied Research)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Information geometry provides a data-informed geometric lens for understanding data or physical systems, treating data or physical states as points on statistical manifolds endowed with information metrics, such as the Fisher information. Building on this foundation, we develop a robust mathematical framework for analyzing data residing on Riemannian manifolds, integrating geometric insights into information-theoretic principles to reveal how information is structured by curvature and nonlinear manifold geometry. Central to our approach are tools that respect intrinsic geometry: gradient flow lines, exponential and logarithmic maps, and kernel-based principal component analysis. These ingredients enable faithful, low-dimensional representations and insightful visualization of complex data, capturing both local and global relationships that are critical for interpreting physical phenomena, ranging from microscopic to cosmological scales. This framework may elucidate how information manifests in physical systems and how informational principles may constrain or shape dynamical laws. Ultimately, this could lead to groundbreaking discoveries and significant advancements that reshape our understanding of reality itself.

Keywords:

Fisher information; kernel methods; Hilbert spaces; Riemannian metric; feature space

MSC:

46; 49; 53; 60; 62

1. Introduction

Information geometry provides insights into understanding data or physical systems by treating data or physical states as points on statistical manifolds, which are equipped with information metrics such as the Fisher information. This viewpoint has become a vibrant interdisciplinary frontier at the crossroads of information and physics, enabling principled analysis of complex data and new information-derived formulations of physical laws [1]. The resulting program promises both conceptual advances—clarifying the link between information and state spaces—and practical benefits for data analysis and the exploration of physical phenomena, ranging from the quantum to the cosmological scale.

Building upon this conceptual foundation, we introduce a robust mathematical framework that leverages advanced geometric and statistical techniques to analyze data residing on Riemannian manifolds [2]. Central to our approach are methods that respect the intrinsic curvature and nonlinear structure of manifold data, such as flow lines of gradient vector fields, exponential and logarithmic maps, and kernel-based principal component analysis (PCA). Throughout the document, we provide a detailed explanation of these tools. They enable accurate, low-dimensional representations and insightful visualization of complex data structures, capturing both the local and global geometric relationships that are critical for understanding the underlying phenomena [3].

By integrating geometric insights with an information-theoretic perspective, our novel framework aims to elucidate the mechanisms by which information emerges in physical systems. This approach reinforces the compelling idea that informational principles are fundamental to the universe’s fabric [4]. It paves the way for exciting possibilities in both pioneering theoretical exploration and significant practical applications across various fields, including data analysis, physics, and more.

The concept that information functions as a fundamental building block of the universe is gaining remarkable momentum among scientists and philosophers [5]. This enlightening perspective suggests that, beyond the realms of matter and energy, information intricately weaves the very fabric of reality, shaping the structure and dynamics of everything from subatomic particles to vast cosmic phenomena. Recognizing information as a core component unlocks deep insights into the nature of physical laws, the origins of the universe, and the intricate interconnectedness of all existence. This paradigm shift not only enhances our understanding but also paves the way for groundbreaking explorations into the universe’s most profound mysteries.

In this initial article, we aim to present a solid mathematical framework and innovative conceptual tools for investigating the deep relationship between information and the physical universe, which has been explored in previous studies [4]. We begin by describing the information sources, the Riemannian structure of the parameter space, and Kernel methods. A significant portion of the article is dedicated to illustrating a data analysis application of the framework, with a particular focus on Kernel PCA. Additionally, we briefly introduce a physical application by expanding upon the variational principle previously published [4]. This information principle will be studied in greater detail in a subsequent manuscript.

By formalizing the principles that govern information processing and its manifestation within physical systems, we aim to uncover the intricate mechanisms through which information may emerge from or be embedded in the fundamental laws of physics. This approach not only establishes strong connections between abstract informational concepts and concrete physical models, but it also provides a solid foundation for investigating how information-theoretic principles could transform our understanding and inspire the development of new theories about the nature of reality.

2. Materials and Methods

In this section, we will meticulously establish a comprehensive mathematical framework by thoroughly defining the information sources and the intricate structure of the parameter space in Section 2.1. We will then examine the rich Riemannian geometry that characterizes this parameter space in Section 2.2, leading to an insightful discussion on the powerful applications of kernel methods in Section 2.3. Section 2.4 examines data analysis in low-dimensional spaces, and Section 2.5 presents a framework for understanding reality.

2.1. The Information Sources and the Parameter Space

Data analysis and physics are inherently connected through their dependence on information sources that supply the data or physical states for examination. However, physical models typically provide a more nuanced and detailed representation of reality, often rooted in causal relationships. With this in mind, we intend to introduce practical tools that can be applied in both fields. Our exploration will begin with an in-depth examination of the vital role of observation, a fundamental aspect that significantly shapes our understanding.

Although the entirety of this phenomenon is located in the observer’s consciousness, it is convenient to simplify the discussion by distinguishing three fundamental elements. The first element is the object studied. The second element is the knowing subject, who is really experimenting: They are aware of what happens. The third element is the environment. These three elements are part of reality in any observation process, and they appear as different elements in the consciousness of the knowing subject, although the reality that underlies and makes observation possible remains hidden. In a simplified sense, we will consider that the said reality is formed by entities that can be viewed as information sources, providing sequences of data for analysis. Some of these entities are used to maintain the underlying structure that makes the internal experience of the knowing subject possible. Let us introduce some notation to establish a convenient and simplified mathematical framework for our work.

To describe the observation process, we will introduce a set,

Θ

, which we will call the parameter space. As a first approximation, we can consider it to be an m-dimensional

C^{\infty}

real manifold, although infinite-dimensional Hilbert or Banach manifolds could also be considered; see [6] for instance. Furthermore, for many purposes, it will suffice to consider the case where

Θ

is a connected open set of

R^{m}

, and in this case, it is customary to use the same symbol,

θ

, to denote points and coordinates. Bearing this observation in mind, we will adopt this case and notation from now on to present the results more familiarly, although they could be written in more general terms.

The manifold

Θ

is really a manifold generated by the observer, regardless of whether this is an immediate abstraction from reality is apprehended directly by the observer’s brain–sensory complex or whether it is constructed from it, following well-defined logical procedures. Observe also that the manifold

Θ

has a natural topology induced by its atlas structure, and thus, this topology allows the construction of the Borel

σ

-algebra,

B

, the

σ

-algebra induced by the topology, that is, the

σ

-algebra generated by the open sets in

Θ

. Moreover, in the measurable space

(Θ, B)

, we may consider a

σ

-finite positive measure

μ

absolutely continuous with respect to any local Lebesgue measure induced through the atlas structure that we shall take as a reference measure of this manifold.

We shall consider that all relevant aspects of the observation process, determined by the object, the subject, the environment, or a mixture of all of them, can be adequately described by the elements of

Θ

through a family of complex-valued maps that we describe below.

Therefore, to further investigate the behavior and properties of information sources, we will introduce a new tool: a regular parametric family of functions,

F_{Θ}

, as a family of measurable maps

h : Θ \times Θ ⟶ C

. These can be expressed in their complex exponential form, that is,

h (ω, θ) = r (ω, θ) e^{i ζ (ω, θ)}

for convenient real-valued functions

r (ω, θ) \geq 0

and

ζ (ω, θ)

, the modulus and the argument of

h (ω, θ)

, when necessary, and we shall assume that the function

ζ

is independent of the reference measure

μ

; however, if we set

θ

, the square of the function r, i.e.,

r^{2} (ω, θ) = h (ω, θ) \bar{h (ω, θ)}

is a probability density with respect to

μ

, that is,

P_{θ} (d ω) = r^{2} (ω, θ) μ (d ω)

is a probability measure in

(Θ, B)

,

\forall θ \in Θ

. The function

r (\cdot, θ)

is therefore dependent on the reference measure

μ

, although the probability measure itself,

P_{θ}

, is independent. Specifically, if

ν

is another reference measure, by the Radon–Nikodym Theorem we shall have that

\frac{d μ}{d ν} = q

for a convenient measurable positive function

q

, then

P_{θ} (d ω) = r^{2} (ω, θ) μ (d ω) = r^{2} (ω, θ) q (ω) ν (d ω) = {\hat{r}}^{2} (ω, θ) ν (d ω)

, obtaining

\hat{r} (ω, θ) = r (ω, θ) \sqrt{q (θ)}

, which explicitly indicates the dependence of r with respect to the reference measure

μ

. In addition, we will implicitly assume all the regularity conditions that we will require for the development of subsequent calculations, such as the necessary smoothness with respect to

θ

and that the closure of the set where the function

r (\cdot, θ)

is strictly positive is independent of the parameter

θ

. We refer to

μ

as a reference measure of the model, h as the model function, and

f (ω, θ) = r^{2} (ω, θ)

as the density of the model below.

2.2. The Riemannian Structure in the Parameter Space

The exploration of Riemannian structures in parameter spaces provides a profound geometric perspective on models and their underlying function spaces. By endowing these spaces with a Riemannian metric, we can investigate the intrinsic geometric properties that influence model behavior, estimation, and inference. Such a perspective enables a more nuanced understanding of the complexity and curvature of the parameter space, offering insights that extend beyond traditional Euclidean approaches.

Letting

L^{2} (Θ, μ)

be the Hilbert space of these complex-valued measurable functions q in

Θ

such that the Lebesgue integral

\int_{Θ} {| q (ω) |}^{2} μ (d ω) < \infty

, where

| z | = \sqrt{z \bar{z}}, z \in C

, if we denote by

D^{2}

the square of the Hilbert distance

D^{2} (p, q) = \int_{Θ} {| p (ω) - q (ω) |}^{2} μ (d ω),

(1)

once given the model function

h (\cdot, θ)

of a parametric family,

F_{Θ}

, since these functions are naturally identifiable with elements of

L^{2} (Θ, μ)

, we are able to build the function

\begin{matrix} δ_{κ}^{2} (γ, θ) & = κ^{2} D^{2} (h (\cdot, γ), h (\cdot, θ)) = κ^{2} \int_{Θ} {| h (ω, γ) - h (ω, θ) |}^{2} μ (d ω) \\ = κ^{2} (2 - \int_{Θ} (h (ω, γ) \bar{h (ω, θ)} + h (ω, θ) \bar{h (ω, γ)}) μ (d ω)), \end{matrix}

(2)

where

κ > 0

is a positive constant which determines the units of the dissimilarity measure

δ_{κ}^{2}

on

Θ

; and when

γ = θ

, we have taken into account

\int_{Θ} h (ω, θ) \bar{h (ω, θ)} μ (d ω) = \int_{Θ} r^{2} (ω, θ) μ (d ω) = \int_{Θ} f (ω, θ) μ (d ω) = 1 \forall θ \in Θ

. Notice that since

| h | = | \bar{h} | = r

, if we change the reference measure

μ

to

ν

accordingly, r changes to

\hat{r}

, and when we integrate (2) now with respect to

ν

, we find that

δ^{2} (γ, θ)

remains invariant.

We are going to consider the Riemannian metric induced in

Θ

by

δ_{κ}^{2}

following the same basic ideas of [7]. If we define

d_{κ, θ}^{2} (γ) = δ_{κ}^{2} (γ, θ)

, we will have

d_{κ, θ}^{2} (θ) = 0

, and, taking into account that the Jacobian of

d_{κ, θ}^{2} (γ)

in

γ = θ

is also null, since the function has an absolute minimum at

θ

, the second-order expansion of

d_{κ, θ}^{2} (γ)

at

θ

will be given by its Hessian at

θ

,

{(d_{κ, θ}^{2} (θ))}^{″} = {({\frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{i} \partial γ^{j}}|}_{γ = θ})}_{(m \times m)}

, which will be positive definite or semidefinite. Using matrix notation, we will have

d_{κ, θ}^{2} (γ) = \frac{1}{2} {(γ - θ)}^{T} {(d_{κ, θ}^{2} (θ))}^{″} {(γ - θ) + O (| γ - θ |}^{3}) .

(3)

Notice also that if we consider an appropriate smooth coordinate change given by

γ = (γ^{1}, \dots, γ^{m}) \mapsto \bar{γ} = ({\bar{γ}}^{1}, \dots, {\bar{γ}}^{m})

, the function

d_{κ, θ}^{2} (γ)

expressed under the new coordinate system as

{\bar{d}}_{κ, \bar{θ}}^{2} (\bar{γ})

is an invariant, that is,

d_{κ, θ}^{2} (γ) = {\bar{d}}_{κ, \bar{θ}}^{2} (\bar{γ})

, and we will have, in classical notation, without using the repeated index summation convention criteria unless we explicitly state otherwise, by the chain rule,

\frac{\partial {\bar{d}}_{κ, \bar{θ}}^{2}}{\partial {\bar{γ}}^{i}} = \sum_{α = 1}^{m} \frac{\partial d_{κ, θ}^{2}}{\partial γ^{α}} \frac{\partial γ^{α}}{\partial {\bar{γ}}^{i}} .

(4)

Moreover, since the second-order partial derivatives are

\frac{\partial^{2} {\bar{d}}_{κ, \bar{θ}}^{2}}{\partial {\bar{γ}}^{j} \partial {\bar{γ}}^{i}} = \sum_{α = 1}^{m} \sum_{β = 1}^{m} \frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{β} \partial γ^{α}} \frac{\partial γ^{β}}{\partial {\bar{γ}}^{j}} \frac{\partial γ^{α}}{\partial {\bar{γ}}^{i}} + \sum_{α = 1}^{m} \frac{\partial d_{κ, θ}^{2}}{\partial γ^{α}} \frac{\partial^{2} γ^{α}}{\partial {\bar{γ}}^{j} \partial {\bar{γ}}^{i}},

(5)

when evaluating them at

γ = θ

, we will have

\partial d_{κ, θ}^{2} / \partial γ^{α} |_{γ = θ} = 0

. Therefore,

{\frac{\partial^{2} {\bar{d}}_{κ, \bar{θ}}^{2}}{\partial {\bar{γ}}^{j} \partial {\bar{γ}}^{i}}|}_{\bar{γ} = \bar{θ}} = \sum_{α = 1}^{m} \sum_{β = 1}^{m} {\frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{β} \partial γ^{α}}|}_{γ = θ (\bar{θ})} {\frac{\partial γ^{β}}{\partial {\bar{γ}}^{j}}|}_{\bar{γ} = \bar{θ}} {\frac{\partial γ^{α}}{\partial {\bar{γ}}^{i}}|}_{\bar{γ} = \bar{θ}},

(6)

where we have denoted by

θ (\bar{θ})

the coordinates of the point

θ

expressed under the transformed coordinate system, obtaining that the components of the Hessian

{(d_{κ, θ}^{2} (γ))}^{″}

at

γ = θ

are the components of a second-order covariant symmetric tensor in the tangent space

T_{θ} (Θ)

; see [8,9] and Appendix A for further details. This tensor is at least semidefinite positive, but hereafter, we will assume the case that is positive definite, as in most of the interesting examples that we will consider. Then, it may be used to define a scalar product at each tangent space, and thus a Riemannian metric in the whole manifold

Θ

. The metric tensor components, under the coordinates

θ = (θ^{1}, \dots, θ^{m})

, will be given by

g_{i j} (θ) = \frac{1}{2} \frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{i} \partial γ^{j}} |_{γ = θ} i, j = 1, \dots, m,

(7)

i.e., the components of the metric tensor will be equal to one half times the second partial derivatives of the function

d_{κ, θ}^{2}

, and the line element corresponding to the above-mentioned induced Riemannian metric, under

θ = (θ^{1}, \dots, θ^{m})

, is given through its square by

d s^{2} = \sum_{i = 1}^{m} \sum_{j = 1}^{m} g_{i j} (θ) d θ^{i} d θ^{j} .

(8)

This formulation is typical in classical information geometry, where the metric is derived from second derivatives of divergence functions or similar measures.

If we express the complex-valued function

h (\cdot, θ)

in its exponential form, that is,

h (\cdot, θ) = r (\cdot,, θ) e^{i ζ (\cdot, θ)}

for a convenient real-valued function

ζ

of

(\cdot, θ)

, from (2), taking into account that if

z = r e^{i ζ}

then

\bar{z} = r e^{- i ζ}

, we obtain

\begin{matrix} δ_{κ}^{2} (γ, θ) & = κ^{2} (2 - \int_{Θ} (r (ω, γ) e^{i ζ (ω, γ)} r (ω, θ) e^{- i ζ (ω, θ)} \\ + r (ω, θ) e^{i ζ (ω, θ)} r (ω, γ) e^{- i ζ (ω, γ)}) μ (d ω)) \\ = κ^{2} (2 - \int_{Θ} r (ω, γ) r (ω, θ) (e^{i (ζ (ω, γ) - ζ (ω, θ))} + e^{- i (ζ (ω, γ) - ζ (ω, θ))}) μ (d ω)) \\ = 2 κ^{2} (1 - \int_{Θ} r (ω, γ) r (ω, θ) cos (ζ (ω, γ) - ζ (ω, θ)) μ (d ω)), \end{matrix}

(9)

and then, if we denote by

d_{H}^{2}

the square of the normalized Hellinger distance between the densities

f (ω, γ)

and

f (ω, θ)

\begin{matrix} d_{H}^{2} (γ, θ) & = \frac{1}{2} \int_{Θ} {(\sqrt{f (ω, γ)} - \sqrt{f (ω, θ)})}^{2} μ (d ω) \\ = \frac{1}{2} \int_{Θ} {(r (ω, γ) - r (ω, θ))}^{2} μ (d ω) \\ = 1 - \int_{Θ} r (ω, γ) r (ω, θ) μ (d ω), \end{matrix}

(10)

since

d_{H}^{2} (γ, θ) \geq 0

, we have

0 \leq \int_{Θ} r (ω, γ) r (ω, θ) μ (d ω) = 1 - d_{H}^{2} (γ, θ) .

(11)

Taking into account that

δ_{κ}^{2} (γ, θ) \geq 0

and

- 1 \leq cos (u) \leq 1

, we shall have

\begin{matrix} |\int_{Θ} r (ω, γ) r (ω, θ) cos (ζ (ω, γ) - ζ (ω, θ)) μ (d ω)| & \leq \int_{Θ} r (ω, γ) r (ω, θ) μ (d ω) \\ = 1 - d_{H}^{2} (γ, θ) . \end{matrix}

(12)

Therefore, from (12) we have

\begin{matrix} d_{H}^{2} (γ, θ) & \leq 1 - \int_{Θ} r (ω, γ) r (ω, θ) cos (r (ω, γ) - r (ω, θ)) μ (d γ) \\ \leq 2 - d_{H}^{2} (γ, θ), \end{matrix}

(13)

and we finally obtain

2 κ^{2} d_{H}^{2} (γ, θ) \leq δ_{κ}^{2} (γ, θ) \leq 2 κ^{2} (2 - d_{H}^{2} (γ, θ)) .

(14)

Observe, additionally, that for

κ = 2

, the quantity

8 d_{H}^{2} (γ, θ)

is the square of the distance between

2 \sqrt{f (\cdot, γ)}

and

2 \sqrt{f (\cdot, θ)}

, functions that are on a radius 2 sphere of

L^{2} (Θ, μ)

. Additionally, the aforementioned amount is, locally, the information metric on

Θ

; for instance, see [7].

\begin{matrix} \frac{\partial d_{κ, θ}^{2}}{\partial γ^{α}} & = - 2 κ^{2} (\int_{Θ} (\frac{\partial r}{\partial γ^{α}} (ω, γ) r (ω, θ) cos (ζ (ω, γ) - ζ (ω, θ)) \\ - r (ω, γ) r (ω, θ) sin (ζ (ω, γ) - ζ (ω, θ) \frac{\partial ζ}{\partial γ^{α}} (ω, γ)) μ (d ω))) . \end{matrix}

(15)

At the point

γ = θ

, we have

\begin{matrix} {\frac{\partial d_{κ, θ}^{2}}{\partial γ^{α}}|}_{γ = θ} = - 2 κ^{2} (\int_{Θ} {\frac{\partial r}{\partial γ^{α}} (ω, γ)|}_{γ = θ} r (ω, θ) μ (d ω)) = 0, \end{matrix}

(16)

since

\int_{Θ} r^{2} (ω, θ) μ (d ω) = 1

, and any partial derivative with respect to

θ^{α}

is equal to zero.

Additionally, since the second-order partial derivatives of

d_{κ, θ}^{2}

divided by 2 are

\begin{matrix} \frac{1}{2} \frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{β} \partial γ^{α}} = & - κ^{2} (\int_{Θ} (\frac{\partial^{2} r}{\partial γ^{β} \partial γ^{α}} (ω, γ) r (ω, θ) cos (ζ (ω, γ) - ζ (ω, θ)) \\ - \frac{\partial r}{\partial γ^{α}} (ω, γ) r (ω, θ) sin (ζ (ω, γ) - ζ (ω, θ)) \frac{\partial ζ}{\partial γ^{β}} (ω, γ) \\ - \frac{\partial r}{\partial γ^{β}} (ω, γ) r (ω, θ) sin (ζ (ω, γ) - ζ (ω, θ)) \frac{\partial ζ}{\partial γ^{α}} (ω, γ) \\ - r (ω, γ) r (ω, θ) cos (ζ (ω, γ) - ζ (ω, θ) \frac{\partial ζ}{\partial γ^{α}} (ω, γ) \frac{\partial ζ}{\partial γ^{β}} (ω, γ)) \\ - r (ω, γ) r (ω, θ) sin (ζ (ω, γ) - ζ (ω, θ) \frac{\partial^{2} ζ}{\partial γ^{β} \partial γ^{α}} (ω, γ)) μ (d ω))), \end{matrix}

(17)

at the point

γ = θ

, we have

\begin{matrix} {\frac{1}{2} \frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{β} \partial γ^{α}}|}_{γ = θ} = & - κ^{2} (\int_{Θ} ({\frac{\partial^{2} r}{\partial γ^{β} \partial γ^{α}} (ω, γ)|}_{γ = θ} r (ω, θ) \\ - r^{2} (ω, θ) {\frac{\partial ζ}{\partial γ^{α}} (ω, γ)|}_{γ = θ} {\frac{\partial ζ}{\partial γ^{β}} (ω, γ))|}_{γ = θ}) μ (d ω)) . \end{matrix}

(18)

However, if we take into account that

r (ω, θ) = \sqrt{f (ω, θ)}

is the real square root of a probability density with respect to the reference measure

μ

, we shall have

\begin{matrix} {\frac{1}{2} \frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{β} \partial γ^{α}}|}_{γ = θ} = & - κ^{2} (\int_{Θ} ({\frac{\partial^{2} \sqrt{f}}{\partial γ^{β} \partial γ^{α}} (ω, γ)|}_{γ = θ} \sqrt{f (ω, θ)} \\ - f (ω, θ) {\frac{\partial ζ}{\partial γ^{α}} (ω, γ)|}_{γ = θ} {\frac{\partial ζ}{\partial γ^{β}} (ω, γ))|}_{γ = θ}) μ (d ω)), \end{matrix}

(19)

and since

\begin{matrix} \frac{\partial \sqrt{f}}{\partial θ^{α}} = \frac{1}{2} \frac{\partial ln f}{\partial θ^{α}} \sqrt{f} and \frac{\partial^{2} \sqrt{f}}{\partial θ^{β} \partial θ^{α}} = - \frac{1}{4} \frac{\partial ln f}{\partial θ^{β}} \frac{\partial ln f}{\partial θ^{α}} \sqrt{f} + \frac{1}{2 \sqrt{f}} \frac{\partial^{2} f}{\partial θ^{β} \partial θ^{α}}, \end{matrix}

(20)

we have

\begin{matrix} {\frac{1}{2} \frac{\partial^{2} d_{κ, θ}^{2}}{\partial γ^{β} \partial γ^{α}}|}_{γ = θ} = & \frac{κ^{2}}{4} \int_{Θ} ({\frac{\partial ln f}{\partial γ^{β}} (ω, γ)|}_{γ = θ} {\frac{\partial ln f}{\partial γ^{α}} (ω, γ)|}_{γ = θ} f (ω, γ) \\ - & {2 \frac{\partial^{2} f}{\partial γ^{β} \partial γ^{α}} (ω, γ)|}_{γ = θ} + f (ω, γ) {\frac{\partial ζ}{\partial γ^{α}} (ω, γ)|}_{γ = θ} {\frac{\partial ζ}{\partial γ^{β}} (ω, γ)|}_{γ = θ}) μ (d ω) \\ = & \frac{κ^{2}}{4} (\int_{Θ} {\frac{\partial ln f}{\partial γ^{β}} (ω, γ)|}_{γ = θ} {\frac{\partial ln f}{\partial γ^{α}} (ω, θ)|}_{γ = θ} f (ω, γ) μ (d ω) \\ + & 4 \int_{Θ} {\frac{\partial ζ}{\partial γ^{α}} (ω, γ)|}_{γ = θ} {\frac{\partial ζ}{\partial γ^{β}} (ω, γ)|}_{γ = θ} f (ω, γ) μ (d ω)) . \end{matrix}

(21)

Moreover, let

I = (i_{α β})

be the

(m \times m)

Fisher information matrix corresponding to the density of the model, that is,

i_{α β} = \int_{Θ} \frac{\partial ln f}{\partial θ^{α}} \frac{\partial ln f}{\partial θ^{β}} f d μ

, and additionally define

ϰ_{α β} (θ) = 4 \int_{Θ} {\frac{\partial ζ}{\partial γ^{α}} (ω, γ)|}_{γ = θ} {\frac{\partial ζ}{\partial γ^{β}} (ω, γ)|}_{γ = θ} f (ω, γ) μ (d ω) .

(22)

We obtain that the fundamental tensor of

Θ

, (7), converting it as a Riemannian manifold, is equal to

g_{α β} (θ) = \frac{κ^{2}}{4} (i_{α β} (θ) + ϰ_{α β} (θ)) .

(23)

The Riemannian metric, using repeated index summation convention, can also be expressed as

d s^{2} = g_{α β} d θ^{α} d θ^{β} = \frac{κ^{2}}{4} (i_{α β} d θ^{α} d θ^{β} + ϰ_{α β} d θ^{α} d θ^{β}) = \frac{κ^{2}}{4} (d s_{I}^{2} + d s_{A}^{2}),

(24)

where

d s_{I}^{2}

is the information metric, and

d s_{A}^{2}

is a metric specifically related to the imaginary part of the function that defines the model. Observe that locally the Riemannian metric induced in

Θ

is the distance induced by the Hilbert metric structure in

L^{2} (Θ, d μ)

on a radius 2 sphere and that it coincides, when

κ = 2

, with the information metric when

ζ

is constant. More results on the information metric can be found in [7,10,11], among others. Additional aspects of the information metric are briefly recalled in Appendix A.

This structure suggests a Kähler-like or complex Riemannian geometry, often encountered in quantum information geometry, where complex structures naturally appear, and the metric encodes both probabilistic and phase information. Quantum information geometry explores the geometric aspects of quantum states and their transformations. Unlike classical probability spaces, quantum state spaces are inherently complex and require a framework that can incorporate both the probabilistic nature of quantum mechanics and the phase information, which is crucial for phenomena like interference and entanglement. The complex structure naturally appears because quantum states are represented by vectors in complex Hilbert spaces, and their evolution and measurement outcomes depend on both amplitude and phase.

The metric in this geometric setting is not just a measure of “distance” between states; it encodes multiple layers of information. On one hand, it captures probabilistic information—how distinguishable two quantum states are based on measurement outcomes—mirroring classical statistical distances but adapted to quantum mechanics. On the other hand, it also encodes phase information, reflecting the relative quantum phases that influence interference effects and the dynamics of quantum systems.

By combining these elements, the described structure offers a comprehensive geometric language that unifies the probabilistic and phase aspects of quantum states. This unified approach facilitates an understanding of quantum state transformations, optimal measurements, and quantum information processing tasks. The resemblance to Kähler geometry provides powerful mathematical tools, such as complex differential calculus and symplectic geometry, which facilitate the analysis of the complex landscape of quantum states and their evolutions. Overall, this geometric perspective enhances our conceptual and computational grasp of quantum phenomena, revealing deep connections between geometry, probability, and phase in the quantum realm.

Please note that while we assume that all properties of an information source are determined by

θ

or

h (\cdot, θ)

, the exact value of

θ

has to be estimated from the data generated by the source. Although we assume that these data can remain partially hidden, it should still allow a reasonably good estimate of both

θ

and

h (\cdot, θ)

.

2.3. Kernel Methods

Kernel methods have become a cornerstone of modern machine learning, enabling flexible and powerful approaches to non-linear data analysis. By implicitly mapping data into high-dimensional (often infinite-dimensional) feature spaces, kernel methods enable linear algorithms to capture intricate patterns that would be difficult to model directly in the original input space. This approach not only enhances the expressive power of models, but it also provides computational advantages through the so-called “kernel trick,” which allows inner products in feature space to be computed efficiently without explicit transformation. At the heart of kernel methods lies the concept of defining kernels based on underlying parametric families.

Given a regular parametric family

F_{Θ}

, we can define a natural kernel in

Θ

, using

L^{2} (Θ, μ)

as a feature space, the Hilbert space of measurable functions with complex value and square integrability in

Θ

, and

J

as a feature map, which maps

Θ

to

L^{2} (Θ, μ)

, defined as

J (θ) = h (\cdot, θ)

. Then, we can define a kernel in

Θ

as

K (θ, γ) = κ^{2} {〈J (θ), J (γ)〉}_{L^{2} (Θ, μ)} = κ^{2} \int_{Θ} h (ω, γ) \bar{h (ω, θ)} μ (d ω),

(25)

for a convenient positive proportionality constant

κ > 0

.

A kernel can be interpreted as a measure of similarity between points in the parameter space, and it allows us to identify, though a kernel K, each

θ \in Θ

with a complex-valued function,

K (\cdot, θ)

, which we assume to contain all relevant features of the point

θ

, or, more formally, through a function

ϕ

, which we refer to as the canonical feature map, defined as

\begin{matrix} ϕ : Θ & \to & C^{Θ} = {f | f : Θ \to C} \\ θ & \mapsto & ϕ (θ) = K (\cdot, θ) . \end{matrix}

(26)

Observe that

ϕ (θ)

is properly a function, much simpler than the previously mentioned

J (θ)

, which is an element of

L^{2} (Θ, μ)

and, therefore, an equivalent class of functions, although it can be identified with one of the equivalence classes to which it belongs. We will consider hereafter that the parametric family

F_{Θ}

is such that the corresponding map

ϕ

, with the natural kernel (25), is 1–1, i.e.,

K (\cdot, θ) = K (\cdot, γ)

then

θ = γ

, as happens in most cases. Then, we can identify

Θ

with

ϕ (Θ)

for almost all pursuit purposes. Notice that

ϕ (Θ)

, which we will call the feature manifold, inherits a natural m-dimensional differentiable manifold structure induced by the manifold structures of

Θ

. The parameter

θ

, identifiable with

ϕ (θ)

, characterizes what we shall call the state of the observation process and can be expressed using the complex exponential form when necessary.

At this point, it is convenient to outline the following well-known mathematical construction that will lead us to a simpler feature space. See [12,13] for details. We start by recalling the concept of a positive definite kernel. A map

K : Θ \times Θ \to C

is a positive definite kernel if and only if, for any mutually distinct points

θ_{1}, \dots, θ_{n} \in Θ

and for any scalars

z_{1}, \dots, z_{n} \in C

, where n is an arbitrary positive integer, we have that

\sum_{i, j = 1}^{n} z_{i} \bar{z_{j}} K (θ_{i}, θ_{j}) \geq 0

where

\bar{z_{j}}

is the complex conjugate of

z_{j}

. Notice that the Hermitian property,

K (θ, γ) = \bar{K (γ, θ)}

, is a consequence of the positive-definiteness. Moreover, if for mutually distinct

θ_{1}, \dots, θ_{n}

, the equality only holds for

z_{1}, = \dots = z_{n} = 0

, it is often said that K is a strictly positive definite Hermitian kernel, although there are small variations in that name to designate this property according different authors. It is also possible to consider real-valued kernels satisfying

K : Θ \times Θ \to R

such that

K (θ, γ) = K (γ, θ)

and

\sum_{i, j = 1}^{n} α_{i} α_{j} K (θ_{i}, θ_{j}) \geq 0

for any positive integer

n \in N

,

α_{i} \in R

,

θ_{i} \in Θ

i = 1, \dots, n

. Observe that (25) satisfies all of the above-mentioned properties, and is thus, in addition, a positive definite kernel because

K (θ, γ) = \bar{K (γ, θ)}

, since clearly

\bar{\int_{Θ} q (ω) μ (d ω)} = \int_{Θ} \bar{q (ω)} μ (d ω)

and for any scalars

z_{1}, \dots, z_{n} \in C

, where n is an arbitrary positive integer,

\sum_{i, j = 1}^{n} z_{i} \bar{z_{j}} K (θ_{i}, θ_{j}) = κ^{2} \sum_{i, j = 1}^{n} z_{i} \bar{z_{j}} \int_{Θ} h (ω, θ_{i}) \bar{h (ω, θ_{j})} μ (d ω) = κ^{2} \int_{Θ} (\sum_{i = 1}^{n} z_{i} h (ω, θ_{i})) \bar{(\sum_{j = 1}^{n} z_{j} h (ω, θ_{j}))} μ (d ω) \geq 0

. Therefore,

K (θ, γ)

it is a complex-valued Hermitian kernel.

Therefore, taking into account that

ϕ (Θ) \subset C^{Θ}

and that

C^{Θ}

has a natural vector space structure over

C

, corresponding to each kernel K, we can build a vector space

P_{K} = s p a n (ϕ (Θ)) \subset C^{Θ}

, i.e., the set of all finite linear combinations of elements of

ϕ (Θ)

. Observe also that the elements of

P_{K}

are complex-valued functions f with domains the parameter space,

Θ

, which can be expressed as

f (\cdot) = \sum_{i = 1}^{n} z_{i} K (\cdot, θ_{i})

for a convenient

n \in N

and

θ_{1}, \dots, θ_{n} \in Θ

,

z_{1}, \dots, z_{n} \in C

.

Moreover, the kernel K allows one to define a Hermitian inner product

〈 \cdot, \cdot 〉

in

P_{K}

as follows: If

f, g \in P_{K}

, then

f (\cdot) = \sum_{i = 1}^{n_{1}} η_{i} K (\cdot, θ_{i})

and

g (\cdot) = \sum_{j = 1}^{n_{2}} ς_{j} K (\cdot, γ_{j})

, with the Hermitian inner product defined as

〈 f, g 〉 = \sum_{i = 1}^{n_{1}} \sum_{j = 1}^{n_{2}} η_{i} ς_{j} K (γ_{j}, θ_{i}),

(27)

which turns

P_{K}

into a pre-Hilbert space. After a standard completion, we can turn our pre-Hilbert vector space

P_{K}

into a true Hilbert space

H_{K}

, which is also a norm and a metric space with the standard norm and the distance defined as from (27). Moreover, this Hilbert space is the reproducing kernel Hilbert space (RKHS) induced by the kernel function K, see [13,14], which satisfies the so-called reproducing property, i.e.,

〈 f, K (\cdot, θ) 〉 = f (θ)

. Therefore,

〈 ϕ (θ), ϕ (γ) 〉 = 〈 K (\cdot, θ), K (\cdot, γ) 〉 = K (γ, θ)

. Additionally, it is a functional Hilbert space, i.e., a Hilbert space in which elements are functions (on

Θ

) such that the point evaluation functional

E_{θ} (f) = 〈 f, K (\cdot, θ) 〉 = f (θ)

is a continuous linear functional on

H_{K}

, with norm

| E_{θ} | = \sqrt{K (θ, θ)} = κ

, because for the natural kernel (25) we have

K (θ, θ) = κ^{2}

since

\int_{Θ} {| h (ω, θ) |}^{2} μ (d ω) = 1

. Let us recall now that with additional assumptions about the parameter space from measure and topological theory, many other properties of kernels can be established, like Mercer’s theorem and relationships of square-integrable function spaces to RKHS, via linear operators; see, for instance, ref. [13]. Notice also that the distance in the feature space,

d_{H_{K}}

, induces a distance in the parameter space,

d_{Θ}

, through the map

ϕ

such that

\begin{matrix} d_{Θ}^{2} (θ, γ) & = d_{H_{K}}^{2} (ϕ (θ), ϕ (γ)) = 2 κ^{2} - K (θ, γ) - K (γ, θ) \\ = 2 κ^{2} - κ^{2} \int_{Θ} r (ω, γ) r (ω, θ) e^{i (ζ (ω, γ) - ζ (ω, θ))} μ (d ω)) \\ - κ^{2} \int_{Θ} r (ω, θ) r (ω, γ) e^{- i (ζ (ω, γ) - ζ (ω, θ))} μ (d ω)) \\ = 2 κ^{2} (1 - \int_{Θ} r (ω, γ) r (ω, θ) cos (ζ (ω, γ) - ζ (ω, θ)) μ (d ω)), \end{matrix}

(28)

which is equal to (9). Observe that this distance reflects the differences between the parameter space points—

θ, γ \in Θ

—or, equivalently, between their images—

ϕ (θ), ϕ (γ) \in ϕ (Θ)

—and locally defines in any of these manifolds

Θ

or

ϕ (Θ)

, the Riemannian metric (24), when

κ = 2

.

We finally remark that the obtained RKHS,

H_{K}

, which can be identified as a closed subspace of

L^{2} (Θ, μ)

, will also be considered as a feature space of the kernel K, defined in (25), and can be considered as the smallest feature space corresponding to this kernel. Indeed, if

W

is another Hilbert space that allows us to obtain the same kernel K, as in

W = L^{2} (Θ, μ)

, then there exists a canonical metric surjection from

W

onto

H_{K}

. As an aside, in later physical applications, we will think of this RKHS as the most convenient abstract model for identifying objects in the physical world, viewing them as vectors in this infinite-dimensional Hilbert space.

Therefore, at this moment, a point in the parameter space

Θ

can be viewed as an element

θ

of the manifold

Θ

, a function of

L^{2} (Θ, μ)

,

κ h (\cdot, θ)

, a function of

H_{K}

,

K (\cdot, θ)

, or any other equivalent mathematical object, such as composing the function h with a suitable 1-1 smooth function, i.e.,

log h (\cdot, θ)

, for example.

For the sake of simplicity, we continue considering that

Θ

as an open subset of

R^{m}

and, with some customary abuse of notation, given the local chart

(I, Θ)

where I is the identity map, we will identify its points with their coordinates. Under this simplified approach,

θ = (θ^{1}, \dots, θ^{m}) \in Θ

can also be viewed as the coordinates of

ϕ (θ)

, a point in the feature manifold. Figure 1 illustrates this framework.

It is also interesting to examine the different representations of the tangent space of

θ

in

Θ

. Under adequate regularity conditions, an obvious representation consists of considering the span of the functions

T_{θ} Θ \equiv 〈κ \frac{\partial h}{\partial θ^{1}} (\cdot, θ), \dots, κ \frac{\partial h}{\partial θ^{m}} (\cdot, θ)〉

. The Hilbert inner product between them will be given by

{〈κ \frac{\partial h}{\partial θ^{i}} (\cdot, θ), κ \frac{\partial h}{\partial θ^{j}} (\cdot, θ)〉}_{L^{2} (Θ, μ)} = κ^{2} \int_{Θ} \frac{\partial h}{\partial θ^{i}} (ω, θ) \bar{\frac{\partial h}{\partial θ^{j}} (ω, θ)} μ (d ω) .

(29)

Furthermore, with adequate smooth and continuity assumptions, the tangent space will also be identifiable with the space spanned by a linear combination of the partial derivatives of the functions obtained through the canonical feature map,

T_{θ} Θ \equiv 〈\frac{\partial K}{\partial θ^{1}} (\cdot, θ), \dots, \frac{\partial K}{\partial θ^{m}} (\cdot, θ)〉

, with the Hilbert inner product between them given by

{〈\frac{\partial K}{\partial θ^{i}} (\cdot, θ), \frac{\partial K}{\partial θ^{j}} (\cdot, θ)〉}_{H_{K}} = \frac{\partial^{2} K}{\partial θ^{j} \partial θ^{m + i}} (θ, θ) = \frac{\partial^{2} K}{\partial θ^{m + j} \partial θ^{i}} (θ, θ) .

(30)

Notice that, given a smooth map

w : Θ \times Θ ⟶ C

, if we fix a point on the manifold

ζ \in Θ

, then we obtain a function

w_{ζ} \equiv w (ζ, \cdot)

, which, under regularity conditions, may be approximated through an intrinsic Taylor expansion of rth at

θ \in Θ

, the coefficients of which are the components of convenient tensors—that is, an expansion independent of the coordinate system in

Θ

constructed via the inverse of the Riemannian exponential map corresponding to (24), specifically

w_{ζ} (γ) = w_{ζ} (θ) + \sum_{k = 1}^{r} \frac{1}{k!} (\nabla^{k} w_{ζ}) (\exp_{θ}^{- 1} (γ), \overset{k)}{\dots}, \exp_{θ}^{- 1} (γ)) + R_{r, θ} (γ) .

(31)

Here,

R_{r, θ} (γ)

is their corresponding residual,

\nabla^{k}

indicates the covariant derivative of k-th order corresponding to the Levi–Civita connection of the Riemannian metric (24), and we have assumed that

γ

belongs to a regular normal neighborhood of

θ

to ensure the existence of the inverse of the exponential map of the connection mentioned above, which is always the case if

γ

is close enough to

θ

.

This development shows that the function

w (γ, \cdot)

has different associated covariant tensor fields that can be used in future developments, as well as the contravariant or mixed tensors required by the well-known operation of raising or lowering tensor indices in a Riemannian manifold. Several of the concepts introduced in the present section will be briefly described in Appendix A.

Observe also that, given a second-order mixed tensor field

\hat{T}

, it corresponds to a linear map operator field,

L_{\hat{T}}

, defined through

ω (L_{\hat{T}} (Υ)) = \hat{T} (ω, Υ), \forall ω \in T^{*} Θ and \forall Υ \in T Θ

. In any case, we hope to relate these fields of linear operators on the tangent bundle of

Θ

,

T Θ

, with the classical spectral analysis of these operators in a Hilbert space. In other words, we can consider the appropriate spectral decompositions carried out in each tangent space and eventually interpret them in physical terms, at least as an approximation. Furthermore, through this framework, we can view the statistical estimates of

θ

, say

γ

, as convenient tensor fields in

T Θ

.

Additionally, it may be useful to apply the same development to the expected value of these estimations. Given a regular parametric family of functions

F_{Θ}

, defined as before, let

M_{w} : Θ ⟶ C

be defined as

M_{w} (θ) = \int_{Θ} w (γ, θ) r^{2} (γ, θ) μ (d γ) .

(32)

As before, under regularity conditions, this map may be approximated through an intrinsic Taylor expansion of the rth at

θ \in Θ

, that is,

M_{w} (γ) = M_{w} (θ) + \sum_{k = 1}^{r} \frac{1}{k!} (\nabla^{k} M_{w}) (\exp_{θ}^{- 1} (γ), \overset{k)}{\dots}, \exp_{θ}^{- 1} (γ)) + R_{r, θ} (γ),

(33)

where

R_{r, θ} (γ)

is their corresponding residual.

2.3.1. The Case of the Gradient Operator

It might sometimes be convenient to consider and represent, in the above framework, real-valued smooth functions defined on

Θ

and their variations throughout the manifold. However, the problem can be stated equivalently in

ϕ (Θ)

through the canonical feature map, since we assume that the canonical feature map

ϕ

is 1–1, and therefore, there is a natural identification not only between smooth functions on both manifolds but also between vectors and tensors via the map

T

, defined as

T : R^{Θ} \to R^{ϕ (Θ)}

, with

f \mapsto T (f) = f \circ ϕ^{- 1}

and its differential. In other words, we may identify our parameter space

Θ

with its image through the feature map. Figure 1 can help visualize all these considerations.

In this context, it will be convenient to represent a certain function q, for example,

q = w_{ζ}

or

q = M_{w}

, in terms of its gradient vector field (see Appendix A for details) since, from a geometric point of view, the gradient of a function indicates the direction of maximum growth of this function per unit of length. Observe that, under the coordinate system

θ = (θ^{1}, \dots, θ^{m})

, the components of the gradient of q are

ξ^{α} = grad {(q)}^{α} (θ) = \sum_{β = 1}^{m} g^{α β} (θ) {(\frac{\partial q}{\partial θ^{β}})}_{θ}, α = 1, \dots, m,

(34)

where

g^{α β} (θ)

are the components of the contravariant metric tensor, i.e., the components of the scalar product in the dual of the tangent space at

θ

,

T_{θ} Θ^{*}

, equal to the components of the inverse of the matrix

G = {(g_{α β} (θ))}_{m \times m}

at

θ

. To simplify the notation, we have identified the points of

Θ

with their coordinates and

{(\partial q / \partial θ^{β})}_{θ}

with the ordinary partial derivative in

R^{m}

,

D_{β} q (θ)

. Notice that in the graphic outputs, in order to describe the behavior of the function q along the entire manifold, it will be enough to represent the field

ξ

at some specific points of

Θ

of

ϕ (Θ)

: those where it seems most relevant to the applied researcher to determine the direction of growth of q. Observe also that the represented gradient vector at one point will be distorted due to the multidimensional character of the problem, which would have implied some reduction of the dimension, as we will see later. So, it is recommended to calculate the final percentage of the norm of the represented vector. We also note that the growth directions may vary significantly along the manifold.

In a complementary way, it may be interesting to find the curves

σ

corresponding to the integral flow of the gradient, that is, the curves whose tangent vectors at t are

σ^{'} (t) = grad (q)

, which indicate, locally, the maximum variation directions of q per unit of length; see Figure 2. Observe that the trace of these curves orthogonally cuts the level hypersurfaces, a multidimensional extension of the typical contour lines of geographic maps, which we could hypothetically examine. In any case, the posterior representation of these curves in 2D or 3D will be affected by ordinary problems of dimension reduction.

Precisely, under the coordinates

θ = (θ^{1}, \dots, θ^{m})

, and denoting the curve

σ (t)

by its coordinates

θ (t)

, the integral flow of the gradient of q is the general solution of the first-order differential equation system

\frac{d θ^{α}}{d t} = \sum_{β = 1}^{m} g^{α β} (θ) {(\frac{\partial q}{\partial θ^{β}})}_{θ}, α = 1, \dots, m,

(35)

which always has a local solution given initial conditions

σ (t_{0}) = θ_{0}

, granted by the Picard–Lindelöf theorem; see, for instance [15]. In any case, the curves obtained corresponding to different initial conditions will have to be represented in a low-dimensional Euclidean space, which could potentially generate many interpretation problems, at least if we move away from their corresponding initial points. Therefore, we conclude the following observation:

Although the system (35) could be explicitly solved, at least numerically, the most useful way to represent the function q is by evaluating the gradient vector field of q at some particular points of the manifold. This representation will also facilitate the interpretation of the bi- or three-dimensional Euclidean space of representation.

2.3.2. Projecting onto an Affine Manifold

Consider now the manifold

ϕ (Θ)

immersed in

H_{K}

. The tangent space at

ϕ (θ)

can be identified with the affine manifold

L_{θ}

, which is determined by an m-dimensional subspace spanning

F_{θ}

with adequate smoothness and continuity assumptions, by the vectors

F_{θ} = 〈\frac{\partial K}{\partial θ^{1}} (\cdot, θ), \dots, \frac{\partial K}{\partial θ^{m}} (\cdot, θ)〉

, with the Hilbert inner product between them given by (30), and a vector

ϕ (θ) \in L_{θ} \subset H_{K}

, which, in a certain sense, we can take as the origin of

L_{θ}

. Any vector

η \in L_{θ} \subset H_{k}

can be written as

η = ϕ (θ) + \sum_{j = 1}^{m} β^{j} \frac{\partial K}{\partial θ^{j}} (\cdot, θ)

. If we introduce matrix notation, defining the

m \times 1

column vectors

β = {(β^{1}, \dots, β^{m})}^{T}

and

K (θ) = (〈 η - ϕ (θ)

and

\frac{\partial K}{\partial θ^{1}} (\cdot, θ) 〉_{H_{K}}, \dots, {〈 η - ϕ (θ), \frac{\partial K}{\partial θ^{m}} (\cdot, θ) 〉}_{H_{K}})^{T}

and the

m \times m

matrix

G (θ) = ({〈\frac{\partial K}{\partial θ^{i}} (\cdot, θ), \frac{\partial K}{\partial θ^{j}} (\cdot, θ)〉}_{H_{K}})

, the components of

β

will be given by the following equation:

β = G^{- 1} (θ) K (θ) .

(36)

Additionally, if

Π_{F_{θ}}

is the projection operator onto

F_{θ}

, each point in

H_{K}

induces a tensor or vector field in

ϕ (Θ)

via the projection operator field

Π_{F_{θ}}

that should be deeply connected, at least as an approximation, with the Hermitian linear operators corresponding to convenient observables in quantum mechanics formulation.

2.4. Data Analysis Applications: Mapping Objects in a Low-Dimensional Space

If we are interested in representing data points in the parameter space

Θ

and real functions of

R^{Θ}

in a low-dimensional Euclidean space, usually a plane, we will have two different strategies. In the first, we will require that the final Euclidean distance between points in the plot be, as far as possible, similar to the distance

H_{K}

between their images in

ϕ (Θ)

, which shall lead us to a type of kernel PCA method (Principal Component Analysis). In contrast, in the second, we shall require that the final Euclidean distance in the plot be similar to the Riemannian manifold distance in

Θ

of

ϕ (Θ)

induced by the metric in

H_{K}

, as in (24). In this case, we shall first map the points in the manifold into a tangent space through the inverse of the exponential map corresponding to the Levi–Civita connection. In this Euclidean space, we shall then use the standard PCA method. Below, we will provide a schematic description of both procedures.

2.4.1. Working in the Feature Space

Among all points we are studying, we may select a specific set of points, which we shall refer to hereafter as principal points, to represent with some optimal property. Specifically, given any set of these principal points

γ_{1}, \dots, γ_{s} \in Θ

and given a positive definite Hermitian kernel K in

Θ

, we can map them into

H_{K}

via the map

ϕ (γ) = K (\cdot, γ)

, and then we can perform a kernel PCA in the reproducing kernel Hilbert space induced by the kernel,

H_{K}

. One primary purpose of this analysis is to obtain a low-dimensional Euclidean data representation, such that the distances between the plotted points accurately reflect the differences between the observations. To achieve this, we shall project the points

ϕ (γ_{i})

onto a convenient r-dimensional linear manifold

L

(usually with

r = 2

or 3) immersed in

H_{K}

. A linear manifold

L

is determined by an r-dimensional subspace

F

spanned by an orthonormal basis

v_{1}, \dots, v_{r} \in H_{K}

and a vector

ξ \in H_{K}

which defines the origin of the data representation. Any vector

η \in L \subset H_{K}

can be written as

η = ξ + \sum_{j = 1}^{r} α_{j} v_{j},

and if

Π_{F}

is the projection operator onto

F

, each observation

γ

will be represented as a single point in a r-dimensional Cartesian coordinates system, with coordinates

α_{1}, \dots, α_{r}

, satisfying

Π_{F} (ϕ (γ) - ξ) = \sum_{j = 1}^{r} 〈 (ϕ (γ) - ξ), v_{j} 〉 v_{j} = \sum_{j = 1}^{r} α_{j} v_{j}

. The next step is to find the linear r-dimensional manifold

L

following a convenient optimization criterion. Two approaches lead to the same result:

(a): To obtain a linear r-dimensional manifold $L$ immersed in $H_{K}$ , minimize

$\begin{matrix} D (v_{1}, \dots, v_{r}, ξ) & : = \sum_{j = 1}^{s} δ^{2} (ϕ (γ_{j}), L), where \\ δ^{2} (ϕ (γ_{j}), L) & = : inf d^{2} (ϕ (γ_{j}), y) \\ y \in L . \end{matrix}$

(37)

Observe that $D$ is a goodness-of-fit measure. If we minimize this quantity, we shall obtain the linear manifold $L$ of closest fit to the system of points $ϕ (γ_{1}), \dots, ϕ (γ_{s})$ in the space $H_{K}$ .
(b): To obtain a linear r-dimensional manifold $L$ immersed in $H_{K}$ , maximize

$\begin{matrix} Δ (v_{1}, \dots, v_{r}, ξ) & : = \sum_{j = 1}^{s} \sum_{k = j}^{s} d^{2} (Π_{F} (ϕ (γ_{j}) - ξ), Π_{F} (ϕ (γ_{k}) - ξ)) . \end{matrix}$

(38)

Notice that

Δ

is a measure of the resolving power of the final output. By maximizing this quantity we shall obtain points separated as much as possible in the low-dimensional Euclidean data representation.

Observe also that both

D

and

Δ

are non-negative real-valued functions with the variables

ξ, v_{1}, \dots, v_{r}

, i.e., the vectors which determine

L

. Taking into account that

δ^{2} (ϕ (γ_{i}), L) = {∥ Π_{F^{⊥}} (ϕ (γ_{i}) - ξ) ∥}^{2},

(39)

where

∥ \cdot ∥

is the norm in

H_{K}

and

Π_{F^{⊥}}

is the projection operator onto

F^{⊥}

, and defining

\bar{ϕ} : = \frac{1}{s} \sum_{i = 1}^{s} ϕ (γ_{i}),

(40)

as the average feature, we have

D (v_{1}, \dots, v_{r}, ξ) = D (v_{1}, \dots, v_{r}, \bar{ϕ}) + s {∥ Π_{F^{⊥}} (ξ - \bar{ϕ}) ∥}^{2} .

(41)

In order to minimize (a), it is straightforward to prove that we have to choose

ξ = \bar{ϕ} : = \frac{1}{s} \sum_{i = 1}^{s} ϕ (γ_{i}) .

(42)

Let

E

be the linear span of

ϕ (γ_{1}) - \bar{ϕ}, \dots, ϕ (γ_{s}) - \bar{ϕ}

, with

\dim E = q \leq s - 1

. If

q \leq r

, then the problem has as trivial solutions any

F

satisfying

E \subset F

, and therefore,

D = 0

. In this case we can choose

v_{1}, \dots, v_{r}

in such a way that

v_{1}, \dots, v_{q}

so as to be a basis of

E

. When

q > r

, in order to minimize

D

, it is clear that we must have

F ⊊ E

, and thus,

v_{1}, \dots, v_{r}

is spanned by

ϕ (γ_{1}) - \bar{ϕ}, \dots, ϕ (γ_{s}) - \bar{ϕ}

; therefore, we can define the

r \times s

matrix

V

with elements

v_{l i}

given by

v_{i} = \sum_{l = 1}^{s} v_{l i} (ϕ (γ_{s}) - \bar{ϕ}) i = 1, \dots, r .

(43)

Next, we shall define the Gram or kernel matrix corresponding to the kernel k and the sample

γ_{1}, \dots, γ_{s}

as the

s \times s

matrix

K

, the elements of which are given by

k_{i j} : = 〈 ϕ (γ_{i}), ϕ (γ_{j}) 〉 =

k (γ_{i}, γ_{j})

, and the

s \times s

centering data matrix defined as

H_{s} : = I_{s} - \frac{1}{s} 1_{s} 1_{s}^{⊺}

, where

I_{s}

is the

s \times s

identity matrix,

1_{s}

is an m-dimensional column vector with all components equal to one, and the symbol ⊺ stands for the transpose matrix operator. Then, if we define

\bar{K} = H_{s} K H_{s},

(44)

where the elements are

{\bar{k}}_{i j} : = 〈 ϕ (γ_{i}) - \bar{ϕ}, ϕ (γ_{j}) - \bar{ϕ} 〉

, we shall have

D = tr (\bar{K}) - tr (\bar{K} V V^{⊺} \bar{K}),

(45)

with the orthonormal basis condition, in matrix form, given by

V^{⊺} \bar{K} V = I_{r} .

(46)

However, since

\bar{K}

is a symmetric and positive semidefinite matrix, we can define its symmetric square root

{\bar{K}}^{1 / 2}

, and then, if we let

U = {\bar{K}}^{1 / 2} V

, we shall have

D = tr (\bar{K}) - tr ({\bar{K}}^{1 / 2} U U^{⊺} {\bar{K}}^{1 / 2}) = tr (\bar{K}) - tr (U^{⊺} \bar{K} U),

(47)

with the condition

U^{⊺} U = I_{r} .

(48)

It is straightforward to prove that

D

is minimized if the r columns of

U

,

u_{1}, \dots, u_{r}

, are the normalized eigenvectors

\bar{K} u_{i} = λ_{i} u_{i}, with λ_{1} \geq λ_{2} \geq \dots \geq λ_{r} \geq \dots \geq λ_{s} = 0,

(49)

corresponding to the largest eigenvalues of

\bar{K}

. Observe also that

〈 ϕ (γ_{i}) - \bar{ϕ}, v_{j} 〉 = \sum_{l = 1}^{s} v_{l j} {\bar{k}}_{i l} .

(50)

Therefore, in the final output, the Cartesian coordinates of

γ_{i}

,

i = 1, \dots, s

, will be given, in matrix notation, by

A : = {(〈 ϕ (γ_{i}) - \bar{ϕ}, v_{j} 〉)}_{s \times r} = \bar{K} V,

(51)

or

A = {\bar{K}}^{1 / 2} U .

(52)

Observe that if the

A_{i}

,

i = 1, \dots, s

, the rows of the matrix

A

are

d^{2} (Π_{F} (ϕ (γ_{i}) - ξ), Π_{F} (ϕ (γ_{j}) - ξ)) = (A_{i} - A_{j}) {(A_{i} - A_{j})}^{⊺},

(53)

and after some straightforward computations, we shall obtain

Δ (v_{1}, \dots, v_{r}, ξ) : = 2 s tr (U^{⊺} \bar{K} U),

(54)

which shows that the criteria (a) and (b) are equivalents.

Notice that we can represent any additional point

η \in Θ

in the output plot. Specifically, to find the Cartesian coordinates of the representation of this point

η

in the final output, we compute the projection of the centered

ϕ

-image of

η

into

F

. For this calculation, notice that

〈ϕ (η) - \bar{ϕ}, v_{k}〉 = \sum_{i = 1}^{s} v_{i k} 〈ϕ (η) - \bar{ϕ}, ϕ (γ_{i}) - \bar{ϕ}〉 .

Introducing the vector

Z = {(k (η, γ_{i}))}_{s \times 1},

(55)

from the last expression, we shall have

{(〈ϕ (η) - \bar{ϕ}, v_{k}〉)}_{1 \times r} = (Z^{⊺} - \frac{1}{s} 1_{s}^{⊺} K) H_{s} V .

(56)

Notice that Kernel PCA (KPCA) uses only the values of the kernel evaluated at

γ_{1} \dots, γ_{s}, η

, since the algorithm formulates the reduction of the dimension in the feature space only through the kernel evaluation at these points.

To facilitate the intelligibility of the KPCA output plot, we can represent in it the

σ (t)

curve’s solution of (35), which indicates, locally, the maximum variation directions of

\hat{f}

, or alternatively, the corresponding gradient vector field, given in (34).The curve

σ (t) = k (\cdot, θ (t))

, with

θ (t)

, is the solution of (35). Then, we can compute the projections of this curve onto the subspace

F

spanned by (43). If we define

Z_{t} = {(k (θ (t), γ_{i}))}_{m \times 1},

(57)

taking into account Equation (56), the induced curve,

\tilde{σ} (t)

, in

F

expressed in matrix form is given by the row vector

\tilde{σ} {(t)}_{1 \times r} = (Z_{t}^{⊺} - \frac{1}{m} 1_{m}^{⊺} K) (I_{m} - \frac{1}{m} 1_{m} 1_{m}^{⊺}) V,

(58)

where

Z_{t}

is of the form (57).

We can also represent the gradient vector field of

\hat{f}

, equal to the tangent vector field corresponding to

σ (t)

, through its projection onto the subspace

F

. The tangent vector at

t = t_{0}

, if

θ_{0} = ϕ^{- 1} \circ σ (t_{0})

, is given by

\frac{d σ}{d t} |_{t = t_{0}}

, and its projection, in matrix form, is

\frac{d \tilde{σ}}{d t} |_{t = t_{0}} = \frac{d Z_{t}^{⊺}}{d t} |_{t = t_{0}} (I_{m} - \frac{1}{m} 1_{m} 1_{m}^{⊺}) V,

(59)

with

\frac{d Z_{t}^{⊺}}{d t} |_{t = t_{0}} = {(\frac{d Z_{t}^{1}}{d t} |_{t = t_{0}}, \dots, \frac{d Z_{t}^{m}}{d t} |_{t = t_{0}})}^{⊺} .

(60)

Taking into account (30), we obtain

\begin{matrix} \frac{d Z_{t}^{i}}{d t} |_{t = t_{0}} & = \frac{d K (γ (t), γ_{i})}{d t} |_{t = t_{0}} \\ = \sum_{α = 1}^{m} D_{α} K (γ_{0}, γ_{i}) \frac{d x^{α}}{d t} |_{t = t_{0}} \\ = \sum_{α = 1}^{m} D_{α} K (γ_{0}, γ_{i}) \sum_{β = 1}^{m} g^{α β} (γ_{0}) D_{β} f (γ_{0}) \\ = \sum_{α = 1}^{m} \sum_{β = 1}^{n} D_{m + α} K (γ_{i}, γ_{0}) g^{α β} (γ_{0}) D_{β} f (γ_{0}), \end{matrix}

(61)

as a result, which basically coincides with [16]. Figure 3 graphically shows the method outlined in the present subsection.

2.4.2. PCA in a Tangent Space

Now, we shall require that the final Euclidean distance in the plot be similar to the Riemannian manifold distance on

Θ

; in this case we shall first map the principal points

γ_{1}, \dots, γ_{s} \in Θ

onto a convenient tangent space through the inverse of the exponential map corresponding to Levi–Civita connection; see Appendix A for additional details. In this Euclidean space, we shall use the standard PCA method. First, we will fix a point

C \in Θ

, which we shall refer to as a reference point on which to base the graphical representation. For reasons that will become apparent later, it seems convenient to choose as

C

the Riemannian center of mass of the points

γ_{j}, j = 1, \dots, s

, that is, a point

C

such that it is a minimum of the map

α \mapsto H (α)

, where

H (α) = \sum_{j = 1}^{s} ρ^{2} (α, γ_{j}) .

(62)

Here,

ρ^{2} (α, γ_{j})

is the square of the Riemannian distance between the points

α

and

γ_{j}

. This point is uniquely defined in many interesting applications; among others, see [8,17].

Then, the principal points

γ_{1}, \dots, γ_{s}

, and the remaining points that we are interested in plotting

γ_{s + 1}, γ_{s + 2}, \dots \in Θ

are mapped onto the tangent space

T_{C} Θ

through the inverse of the Levi–Civita connection exponential map,

\exp_{C}^{- 1}

, which is well defined (almost everywhere) in most applications. Figure 4 helps us to clarify the role of the inverse of the exponential map. The manifold is the (radius R) basket ball, with the Riemannian metric induced by the Euclidean geometry. See Appendix A for further details.

Principal points play an important role since the computation of the above-mentioned center of mass (if, as we have recommended, it is taken as the reference point

C

) and the projector

π_{L}

is only concerned with them, independently of the other studied points in

Θ

. The selection of the principal points will depend, of course, on the particular problem, and the possibilities are unlimited. We select as principal points, for example, pairs of points which define the gradient of a particular real-valued function on

Θ

, computed at certain fixed points. It would be convenient to select these points interactively with suitable computer software. Therefore, we obtain

W_{1} = \exp_{C}^{- 1} (γ_{1}), \dots, W_{s} = \exp_{C}^{- 1} (γ_{s}), W_{s + 1} = \exp_{C}^{- 1} (γ_{s + 1}), \dots,

(63)

and, thus, our original points

γ_{1}, \dots, γ_{s}, γ_{s + 1}, \dots \in Θ

are now represented as points in a m-dimensional Euclidean space

T_{C} Θ

, with scalar product corresponding to the basis

{(\partial / \partial θ^{α})}_{C}

α = 1, \dots, m

, given by the information matrix at

C

, where

{(\frac{\partial}{\partial θ^{α}})}_{C} f = D_{α} (f \circ ϕ) (ϕ^{- 1} (C)) α = 1, \dots, m .

(64)

We shall use the same symbol,

W

, to denote the points at

T_{C} Θ

and their coordinates with respect to the basis vector field mentioned above, in matrix notation, as a vector

m \times 1

. It is important to note that

\exp_{C}^{- 1}

, considered a map between two metric spaces, preserves the distance between any point and

C

, although it does not preserve, in general, distances between arbitrary points, in such a way that the distortion between both distances is small for points close to

C

.

More precisely, let

B (C, R)

be a Riemannian convex ball of radius

R > 0

, and let

θ_{α}, θ_{β} \in B (C, R / 2) \subset Θ

,

θ_{α} \neq θ_{β}

. We shall have

\frac{| ρ^{2} (θ_{α}, θ_{β}) - | \exp_{C}^{- 1} (θ_{α}) - \exp_{C}^{- 1} (θ_{β}) |_{C}^{2} |}{ρ^{2} (θ_{α}, θ_{β})} \leq \frac{1}{3} D R^{2} {1 + O (R^{2})},

(65)

where

D = max | K (X, Y) |

on

\bar{B} (γ, R)

, the closure of the Riemannian ball

\bar{B} (γ, R)

, with

K (X, Y)

being the Riemannian sectional curvature. Observe that the relative error to approximate

ρ^{2} (θ_{α}, θ_{β})

, the square of the Riemannian distance between the points

θ_{α}

and

θ_{β}

, by

| \exp_{C}^{- 1} (θ_{α}) - \exp_{C}^{- 1} (θ_{β}) |_{C}^{2}

is of the order of

R^{2}

. See Appendix A for details. To facilitate the reading of the article for the common reader, we have obtained (65), a formula more or less familiar to readers with a background in Riemannian geometry.

This fact suggests the convenience of choosing

C

as the center of a small Riemannian ball that includes the majority of the principal points, with the Riemannian center of mass of the principal points being a good candidate since, in that case, we shall have, lower error bounds for the metric distortions on average. At any rate, it is possible to measure the global distortion introduced through an analogue of the correlation cophenetic coefficient.

Then a usual PCA is performed over the representation of the principal points at

T_{C} Θ

,

W_{1}, \dots, W_{s}

, taking into account the scalar product in the tangent space to obtain a q-dimensional affine manifold,

L

, with

q = 2

or

q = 3

, which allows for an optimal representation of the principal points. Since the scalar product matrix at

T_{C} Θ

, with respect to the vector basis mentioned above, is

G (C)

, given by (7), the following diagonalization must be solved:

C u = λ G^{- 1} (C) u .

(66)

Here,

C

is the covariance matrix (or any multiple) of the coordinates of the principal points of representation,

W_{1}, \dots, W_{s}

, in the tangent space,

G^{- 1} (C)

is the inverse of the metric tensor matrix at

C

,

λ

is an eigenvalue, and

u

is a corresponding eigenvector. If

U

is an

m \times q

matrix whose columns are the normalized eigenvectors corresponding to the q highest eigenvalues, then the q principal coordinates, in

L

, of a point

W

, in matrix notation, as a column vector

q \times 1

, are given by

U^{⊺} W

. In other words, if

π_{L}

is the projection operator corresponding to the matrix

U

, supplied by PCA, then all statistical objects (not only the principal points) can be represented in

S

as

π_{L} \circ \exp_{C}^{- 1} (θ_{1}), \dots, π_{L} \circ \exp_{C}^{- 1} (θ_{s}), \dots, π_{L} \circ \exp_{C}^{- 1} (θ_{s + 1}), \dots \in L .

(67)

In summary, the whole procedure can be summarized by

θ_{i} \overset{\exp_{C}^{- 1}}{\to} W_{i} \overset{PCA}{\to} ζ_{i} = U^{⊺} W_{i} .

(68)

We shall refer hereafter to this procedure as Intrinsic Data Analysis, IDA for short. It is possible to carry out the steps mentioned above, replacing the barycenter

C

by another point

C^{'}

, which may be easier to compute or more suitable in a specific application, although in this case, the metric distortion caused by

\exp_{C}^{- 1}

will probably increase.

Once the projector

π_{L}

has been computed, each single point on the tangent space can be mapped in

L

, and the curves

σ (t) = K (\cdot, γ (t))

and their tangent vector field can also be mapped on

L

in a straightforward way.

2.5. Physical Applications

In the mathematical framework succinctly presented above, we can define the notion of extended information, codified by the estimation

ω

, obtained from an information source of the true parameter

θ

, relative to this true value of the parameter, and referred to as an arbitrary point

θ_{0}

, as

\begin{matrix} I_{ω} (θ) & = - ι (log h (ω, θ) - log h (ω, θ_{0})) \\ = - ι (ln r (ω, θ) + i ζ (ω, θ) - ln r (ω, θ_{0}) - i ζ (ω, θ_{0})), \end{matrix}

(69)

where log denotes a version of the complex logarithm defined to satisfy

log h (ω, θ) = ln r (ω, θ) + i ζ (ω, θ)

, with ln being the standard natural logarithm, and

ι

being a constant that determines the units of information with which we will work. The implicit dependence of (69) on

θ_{0}

is omitted from the notation since its choice will not play any further role when we calculate its gradient in the parametric space. Additionally, (69) will remain invariant under appropriate data changes, and fixed

ω

is also invariant under coordinate changes in the parametric manifold, since it is a scalar field on

Θ

.

The information provided by a source external to the observer is represented within him, allowing him to increase his understanding of the objects. Although we can consider different levels and types of the said representation, we will focus on two critical aspects: the parameter space with its natural extended information geometry and the observer’s ability to construct plausibility regarding the true value of the parameter,

Ψ

, in the parameter space

Θ

. This estimation is fundamentally a complex square root of a specific kind of subjective conditional probability density with respect the Riemannian volume, induced by the information metric over the parameter space,

Θ

, up to a normalization constant. Specifically, we shall write at the beginning

\int_{Θ} Ψ (θ) \bar{Ψ (θ)} V (d θ) = a > 0,

(70)

although we shall be particularly interested in the case

a = 1

. Observe that in (70), we are integrating with respect to the Riemannian measure defined in (7); therefore, this expression is invariant under coordinate changes. Furthermore, if we intend to define a probability on the parametric manifold, interpretable as a plausibility about the true value of the parameter, we can take the function

Ψ \bar{Ψ}

as the Radon–Nikodym derivative of said probability with respect to the Riemannian volume, also a measure in the same parameter space. Both measures are independent of the coordinate system used and, therefore,

Ψ \bar{Ψ}

will be an invariant scalar field on the parametric manifold

Θ

. Then, we can simply define the information encoded by the subjective plausibility

Ψ

on the parameter space

Θ

relative to the true parameter

θ

as

Λ (θ) = - ι log (Ψ (θ)) .

(71)

This quantity (71) also remains invariant under coordinate changes in the parametric manifold, being another scalar field in

Θ

; see also [4].

The Variational Principle

In this context, trusting that many of the abilities of the observer have been efficiently shaped by natural selection in the process of biological evolution, we can propose that the subjective information mentioned above adjusts in some way to the information provided by the source and, in particular, satisfies the following variational principle (see also [4]):

Ω (Ψ) = \int_{Θ} {| grad I_{ω} (θ) - grad Λ (θ) |}^{2} Ψ (θ) \bar{Ψ (θ)} V (d θ) .

(72)

Here, is a minimum, or at least stationary, and subject to the constraint (70), and assuming that

Ψ

and its gradient,

grad Ψ

, vanish at the boundary

\partial Θ

or at infinity, which is the case for the model, we have strong reason to believe that the true parameter

θ

should belong far from the boundary, clearly inside of

Θ

.

Observe that the functional

Ω

is equal to the expected value corresponding to the probability in

Θ

given by the density

Ψ \bar{Ψ}

with respect to the Riemannian volume induced by the metric (24),

V (d θ)

, of the square of the norm, corresponding to a complexified vector field in a Riemannian manifold, as indicated in Appendix A. Since the difference between the gradients

grad I_{ω} (θ)

and

grad Λ (θ)

is a complexified vector field, i.e., if

grad I_{ω} (θ) - grad Λ (θ) = W_{1} (θ) + i W_{2} (θ)

, then

| grad I_{ω} {(θ) - g r a d Λ (θ) |}^{2} = | W_{1} {(θ) |}^{2} + {| W_{2} (θ) |}^{2}

. Observe that (72) is invariant under coordinate changes in

Θ

since the square of the norm inside the integral is invariant and

Ψ (θ) \bar{Ψ (θ)} V (d θ) = Ψ (θ) \bar{Ψ (θ)} \sqrt{g (θ)} d θ

is as well.

Notice also that the source is considered to be objective or at least, strictly speaking, intersubjective, while the parameter space, with its geometric properties, is in some sense built by the observer and is therefore subjective, although strongly conditioned by the source. In future work, some additional restrictions will be added to the variational principle (72) with the aim of more accurately modeling the observation process as a whole.

Any change in the information encoded by

ω

caused by considering a change at the source in the parameter space should correspond to a change in the subjective information proposed by the observer. For this reason, we propose that the squared difference of both gradients

grad I_{ω}

and

grad Λ

would, on average, result in

Ω

being as small as possible, at least locally.

This variational principle is an extension of a previous one presented by the authors in [4], where only regular parametric statistical models with the standard information metric were considered. Based on the former variational principle applied to these models and playing only with basic statistical tools, in particular the information carried by the data in a given statistical model, for instance, see [18,19,20], we obtain a probability density in the parameter space

Θ

by solving a system of partial differential equations. This probability can be viewed as a Bayesian posterior probability over all possible probabilistic mechanisms that have generated the data, probabilistic mechanisms identifiable with the parameters. Furthermore, if we apply this procedure of analyzing data to the simple statistical model corresponding to a multivariate normal distribution with a constant covariance matrix, a model that can be considered at least as an approximation of many regular statistical models for large samples via extensions of the Central Limit Theorem, we obtain, from those above-mentioned partial differential equations in the parameter space, a specific differential equation already studied in Physics and known as the stationary (time-independent) Schrödinger equation applied to a quantum harmonic oscillator.

We will soon present another work where the variational principle (72) will be resolved with, at least, the restriction (70).

3. Discussion

Understanding the complex relationship between data analysis and physics necessitates a comprehensive examination of the foundational elements that underpin observation and modeling. This paper emphasizes the intrinsic connection between information sources—conceptualized as entities providing sequences of data—and the physical models that aim to represent reality with causal nuance. Central to this discourse is the role of observation, which is depicted as a triadic interaction involving the studied object, the knowing subject, and the environment.

By dissecting these elements within a mathematical framework, this work bridges the cognitive process of observation with the geometric and probabilistic structures that characterize physical models. The observation process, as articulated, hinges on the recognition that all relevant phenomena are ultimately situated within the observer’s consciousness. However, simplifying assumptions facilitates the development of a formal model, where entities involved in observation—objects, subjects, and environments—are represented through elements of a parameter space

Θ

. This space, modeled as a smooth manifold, encapsulates the parameters that characterize the underlying data-generating processes.

Such a manifold structure allows for the application of differential geometry, enabling the exploration of the intrinsic properties of models through tools like the Riemannian metric induced by divergence functions or distance measures. The introduction of complex-valued functions

h (ω, θ)

and their exponential representations illustrates how models of data sources can be embedded within a Hilbert space framework. By defining a family of functions

F_{Θ}

, we leverage the geometric interpretation of statistical models, where distances between models—quantified via functions such as the Hilbert metric

D_{2}

or the Hellinger distance

d_{H}

—are instrumental in understanding model complexity and distinguishability.

The construction of a Riemannian metric from these divergence measures aligns with the principles of information geometry, providing a natural geometric framework for analyzing the parameter space. Specifically, the metric tensor components are derived from the second derivatives of divergence functions, which encapsulate the local curvature of the parameter space. This curvature, reflecting the model’s complexity and sensitivity to parameter variations, influences estimation procedures and inference accuracy.

The Fisher information matrix emerges as a pivotal component, quantifying the amount of information that an observable data source contains about the parameters. When combined with phase-related terms arising from complex exponentials (e.g., the functions

ζ (ω, θ)

), the metric captures both probabilistic and phase information—concepts reminiscent of quantum information geometry. The extension to quantum contexts underscores the universality of such geometric structures. Quantum states, represented by vectors in complex Hilbert spaces, inherently embody both amplitude and phase.

The metric structures derived from complex functions thus encode not only classical probabilistic information—determinable via the Fisher information—but also quantum phases that influence phenomena such as interference and entanglement. This dual encoding exemplifies how geometric tools can unify seemingly disparate layers of information, providing a comprehensive framework for understanding complex systems. This underscores the profound role of geometric and probabilistic structures in modeling physical phenomena and data sources.

Formalizing observation through parameter spaces endowed with Riemannian metrics derived from divergence measures provides a versatile toolkit applicable to both classical and quantum domains. The approach highlights that models are not merely static representations but dynamic entities with curvature and structure that influence estimation, inference, and ultimately, our understanding of reality. Integrating these geometric insights into physical modeling enhances both the conceptual framework and the practical methodologies employed in analyzing complex systems, thereby facilitating more nuanced and precise scientific investigations.

We present a method for effectively connecting two distinct ways of representing data through a specialized mathematical transformation, referred to as a “feature map.” When this feature map is one-to-one—ensuring that each point in the original space corresponds uniquely to a single point in the transformed space—we can establish a reliable method for transferring various functions and objects from the original domain to the new one.

This approach is further enriched by the kernel method, which allows us to implicitly work within a high-dimensional feature space without explicitly computing the feature map. Specifically, kernel functions enable us to compute inner products in the transformed space directly from the original data, facilitating efficient computations even in infinite-dimensional feature spaces. This capability ensures that mathematical entities, such as functions, vectors, and matrices, which characterize the data, can be consistently interpreted within this new context, preserving all information and structural properties.

Moreover, when considering data residing on Riemannian manifolds—smooth, curved spaces that generalize Euclidean geometry—these ideas become even more powerful. Riemannian manifolds provide a natural setting for modeling complex data that intrinsically possesses geometric structure, such as shapes, surfaces, or biological structures. In such contexts, the notion of a feature map can be extended to respect the manifold’s geometric properties, allowing for transformations that preserve curvature and intrinsic distances.

By leveraging kernels adapted to Riemannian manifolds—so-called Riemannian kernels—we can perform implicit high-dimensional embedding that accounts for the manifold’s geometry, enabling meaningful comparisons and analyses of data points lying on curved spaces. This synergy between kernel methods and Riemannian geometry facilitates a more profound understanding of complex data structures, supporting tasks such as classification, regression, and visualization within intrinsically curved spaces. Overall, the integration of Riemannian manifolds into kernel-based feature mappings broadens the scope of data representation, ensuring that the geometric essence of the data is maintained throughout the transformation process.

Then, we present a sophisticated approach to analyzing data on Riemannian manifolds, combining differential geometry, kernel methods, and principal component analysis (PCA). At its core, the methodology emphasizes understanding the intrinsic geometric structure of the data, which is crucial when the data naturally resides in nonlinear, curved spaces rather than Euclidean ones. This approach is particularly relevant in fields where the data’s inherent geometry cannot be ignored.

One of the central themes revolves around the flow lines of the gradient vector field, denoted as curves

σ (t)

, which are solutions to a first-order differential system. These curves represent the integral flow of the gradient of a function q, effectively indicating the directions of maximum variation of q on the manifold. Geometrically, these curves intersect the level hypersurfaces orthogonally, much like contour lines on a geographic map, thereby providing a visual and intuitive understanding of the function’s local behavior. The analysis of these flow lines is essential in understanding the local structure of the data and in identifying directions that capture the most significant variations, which can guide dimension reduction strategies.

The representation of these flow lines, especially in low-dimensional projections, is complicated by the manifold’s nonlinear nature. We discuss the challenges associated with dimension reduction, highlighting the potential distortion when projecting high-dimensional manifold data into Euclidean spaces. This issue is addressed through the use of the exponential and logarithmic maps, which transfer points between the manifold and its tangent spaces, preserving local geometric properties. The inverse exponential map,

\exp^{- 1}

, maps points from the manifold to the tangent space at a reference point, facilitating the application of Euclidean PCA in the tangent space. The choice of the Riemannian center of mass as the reference point is significant, as it ensures that the PCA captures the dominant variation directions in a manner faithful to the manifold’s geometry.

Kernel methods, particularly Kernel PCA (KPCA), are integrated into this framework to perform nonlinear dimension reduction in the feature space induced by a kernel function. The kernel encapsulates the manifold’s intrinsic geometry implicitly, allowing for the analysis of data that may be difficult to embed linearly. We describe how the kernel matrix K, constructed from evaluations of the kernel function at data points, encodes geometric relationships, such as distances and angles, in a high-dimensional feature space. This means that the kernel matrix captures the relationships between data points in a way that respects the manifold’s geometry.

Eigendecomposition of the kernel matrix reveals principal directions of variation, which are then projected back into the original data space. This process enables the visualization of complex structures, such as flow lines and gradient vector fields, in low-dimensional representations that respect the manifold’s intrinsic distances. Furthermore, the methodology incorporates a nuanced treatment of the curvature effects inherent in Riemannian manifolds. The approximation of the squared distances using the inverse exponential map, with bounds related to sectional curvature, ensures that local linearizations remain valid within convex neighborhoods. This means that within these neighborhoods, the manifold can be approximated as a Euclidean space, allowing for the application of linear methods such as PCA.

By selecting a central point, such as the Riemannian barycenter, the analysis minimizes distortion, resulting in more faithful low-dimensional visualizations. The tangent space PCA—referred to as intrinsic PCA—operates on the mapped principal points, allowing for an eigendecomposition that respects the metric tensor, thus accounting for the manifold’s local curvature. This combined geometric and statistical framework underscores the importance of respecting the data’s underlying structure. By integrating differential geometric concepts with kernel methods and PCA, the approach achieves a balance between local fidelity and global interpretability. It enables the extraction of meaningful features, flow patterns, and principal directions that are invariant under the manifold’s curvature, resulting in more robust and insightful analyses.

This methodology exemplifies a comprehensive approach to manifold data analysis, emphasizing the importance of intrinsic geometry. The use of flow lines of the gradient, the application of exponential and logarithmic maps, the implementation of kernel-based reduction, and the intrinsic PCA all serve to capture the complex structure of data residing in nonlinear spaces. Such techniques are increasingly vital in modern data science, where high-dimensional, curved data structures are prevalent, and a faithful representation of their geometry is essential for meaningful interpretation.

Building on this synthesis, the geometric framework we propose opens new avenues for understanding the dynamics of information in physical systems. By conceptualizing inference processes as variational principles operating on Riemannian manifolds, we suggest that the evolution of knowledge—both in scientific models and in fundamental physics—may be described by wave-like phenomena governed by invariant geometric structures. This perspective aligns with the notion that physical laws themselves can be viewed as extremal principles on curved manifolds, where the flow of information and the evolution of states are intrinsically linked. Such a viewpoint encourages us to explore whether the principles underlying quantum mechanics, general relativity, and thermodynamics can be unified within a common geometric language rooted in information theory.

Consequently, this approach not only enriches our theoretical understanding of inference but also suggests a deeper, possibly holographic, nature of reality, where the fabric of spacetime and the behavior of matter emerge from the geometry of information. Future research may reveal whether these geometric and variational insights can lead to new formulations of physical laws, bridging the gap between abstract information geometry and the tangible fabric of the universe.

Of course, this paper is only the first step towards the above-mentioned deep relationship between physical and data analysis. We would certainly like to have a clear path already outlined for developing the variational principle and connecting it with the major existing physical theories. For the time being, in our immediate work, we will limit ourselves to attempting to derive some existing basic equations—the Schrödinger, Klein-Gordon, and Dirac equations—and to the extent possible, we will explore their compatibility with general relativity and thermodynamics. Other relationships, such as decoherence, entanglement, and von Neumann’s entropy, for example, will be addressed in future work.

Author Contributions

Conceptualization, D.B.-C. and J.M.O.; writing—original draft preparation, D.B.-C. and J.M.O.; writing—review and editing, D.B.-C. and J.M.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study did not require ethical approval.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We sincerely appreciate the insightful feedback from the reviewers, which has significantly sharpened the clarity of the ideas presented in our article. Their constructive critique has not only highlighted our current limitations but has also sparked an inspiring challenge that will drive our future research forward.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Differential Geometry Remarks

In this appendix subsection, we will briefly review some well-known concepts that can be further explored in classical references on differential geometry, data analysis, and statistics, such as [8,9,13], among many others.

Consider an m-dimensional smooth real manifold

Θ

, and let r and s be non-negative integers. A is a

(r, s)

-tensor field, r times contravariant and s times covariant, in an open set U of a manifold

Θ

, and, roughly speaking, is a multilinear map defined at

\begin{matrix} \underset{︸}{T_{θ}^{*} Θ \times \dots \times T_{θ}^{*} Θ} & \times & \underset{︸}{T_{θ} Θ \times \dots \times T_{θ} Θ} & ⟶ R, \\ r & s \end{matrix}

(A1)

depending smoothly on

θ \in U

. Taking into account that a coordinate system

(θ^{1}, \dots, θ^{m})

induces a basis vector field in each tangent space,

T_{θ} Θ

, denoted as

\partial / \partial θ^{1}, . . ., \partial / \partial θ^{m}

, and also a basis vector field in each dual,

T_{θ}^{*} Θ

, denoted as

d θ^{1}, \dots, d θ^{m}

, the components of the tensor field A, corresponding to these basis fields, will be denoted by

A_{α_{1}, \dots, α_{s}}^{α_{1}, \dots, α_{r}}

, and when we change to another coordinate system

({\bar{θ}}^{1}, \dots, {\bar{θ}}^{m})

, these components are going to change accordingly to

{\bar{A}}_{β_{1}, \dots, β_{s}}^{α_{1}, \dots, α_{r}} = \frac{\partial {\bar{θ}}^{α_{1}}}{\partial θ^{i_{1}}} \cdot \dots \cdot \frac{\partial {\bar{θ}}^{α_{r}}}{\partial θ^{i_{r}}} \cdot \frac{\partial θ^{j_{1}}}{\partial {\bar{θ}}^{β_{1}}} \cdot \dots \cdot \frac{\partial θ^{j_{s}}}{\partial {\bar{θ}}^{β_{s}}} \cdot A_{j_{1}, \dots, j_{s}}^{i_{1}, \dots, i_{r}} .

(A2)

In (A2), we have used the summation convention of repeated indices; all these quantities are evaluated at

θ

, and we have used classical notation. If

r = s = 0

, then A is a smooth function on

Θ

, which is just an invariant with respect to coordinate changes. If

r = 0

, we just say that A is an s-covariant tensor field, while if

s = 0

, we just say that A is an r-contravariant tensor field. We shall call intrinsic any object or property independent of the coordinate system in

Θ

.

Let us recall that if

Θ

and

Ξ

are smooth real manifolds, and q is a smooth mapping on an open subset

U \subset Θ

to values in

Ξ

, then this map induces, for each

θ \in U

, a linear map between

T_{θ} Θ

and

T_{q (θ)} Ξ

, which we will call the differential of q at

θ

and denote by

q_{*}

, such that if

η \in T_{θ} Θ

, then we define

(q_{*} η) h = η (h \circ q)

, for all smooth function h on a neighborhood of

q (θ)

.

Also, we recall that a curve σ in

θ

is a smooth map of an open interval

I \subset R

in

θ

. When we talk about a

σ

curve of a closed interval

[a, b]

in

Θ

, we will assume that the domain of the curve is an open

R

containing

[a, b]

.

Given a m-dimensional real manifold

Θ

, let

T Θ

denote the tangent bundle (the set of all the pairs such that

(θ, ξ)

with

ξ \in T_{θ} Θ

and having the structure of a

2 m

-dimensional manifold), and let

π : T Θ \to Θ

be the projection function, where

π (θ, ξ) = θ

if

ξ \in T_{θ} Θ

. Let I be an open interval in

R

and

σ : I \to Θ

be a smooth curve in

Θ

. A vector field

V

over

σ

is a map

V : I \to T Θ

such that

π \circ V = σ

, i.e.:

V (s) \in Θ_{σ (s)}

for all

s \in I

.

For each point in the

σ

domain,

t \in I

, we define the tangent vector to σ at t as the vector

T_{σ (t)}

defined from

T_{σ (t)} = σ_{*} ({(\frac{d}{d s})}_{t}) or, simply, T = σ_{*} (\frac{d}{d s})

, where

\frac{d}{d s}

is the basis vector field corresponding to the identity chart in

R

. The tangent field can also be denoted by

σ^{'} (t) = σ_{*} (\frac{d}{d s})

. For simplicity, we often overuse notation and identify the tangent fields on curves with their images, that is, we will write

σ^{'}

as

t \mapsto σ^{'} (t)

.

A Riemannian manifold will be a manifold equipped with a Riemannian metric, that is, a second-order covariant smooth tensor field, which is additionally positive definite at each tangent space. In simple words, we have a scalar product defined in each tangent space that varies smoothly when we move to different points in the parametric space, which will be denoted by

{〈, 〉}_{θ}

. The local version of the Riemannian metric in

Θ

will commonly be expressed, using the summation convention of repeated indices, as

d s^{2} = g_{μ ν} (θ) d θ^{μ} d θ^{ν}

, as in (8), where

g_{μ ν} (θ) = {〈 \partial / \partial θ^{μ}, \partial / \partial θ^{ν} 〉}_{θ}

.

Corresponding to this fundamental tensor, whose components are given by

g_{μ ν} (θ)

, there are the well-known operations of raising or lowering indices, which in fact allow us to identify covariant with contravariant tensor fields, or the opposite. For a particular index and for every

θ \in Θ

, these operations are isomorphisms between the tangent bundle

T Θ

and its dual, the cotangent bundle,

T^{*} Θ

, the flat map (lowering indexes), symbolized by ♭ and defined through

♭ (X) (Y) = 〈 X, Y 〉, \forall X, Y \in T Θ

, or the opposite, between the dual of the tangent bundle

T^{*} Θ

and its dual, the bidual bundle of

T Θ

canonically identifiable with the tangent bundle itself,

T Θ

, the sharp map (raising indexes), symbolized by ♯ and defined through

♯ (ω) (Y) = 〈 ♯ (ω), Y 〉, \forall ω \in T^{*} Θ and \forall Y \in T Θ

. In component notation, this is achieved simply by multiplying the components of a specific tensor by the fundamental covariant tensor field

g_{μ ν}

at

T Θ

or its inverse

g^{i j}

, the fundamental contravariant tensor field at the dual of the tangent bundle, the cotangent bundle

T^{*} Θ

.

The gradient of a smooth function q in an open neighborhood of a point

θ

of a Riemannian manifold

Θ

is a vector field

ξ = grad (q)

such that

{〈 ξ, ζ 〉}_{θ} = ζ_{θ} (q)

for all vector field

ζ

, where

ζ_{θ} (q)

is the directional derivative of q in the direction given by

ζ

at

θ

. Observe that in the previous notation

ξ = grad (q) = ♯ (\nabla q)

, it is well known that for

| ζ_{θ} | = 1

, taking into account the definition of the gradient and the Cauchy–Schwarz inequality, we have

| ζ_{θ} (q) | \leq | grad (q) |

, and the maximum is reached by the unitary vector

ζ_{θ} = grad (q) / | grad (q) |

. On the other hand, it will be possible to extend these mathematical objects to the complex case. If

ϕ (θ)

is a complex-valued function, and if

Re

and

Im

are the standard real and imaginary parts of a complex number, if we let

a (θ) = Re (ϕ (θ))

and

b (θ) = Im (ϕ (θ))

, we may define

grad (ϕ) = grad (a) + i grad (b)

, and we can extend the scalar product in each tangent space to complexified vector fields

U (θ) = X_{1} (θ) + i Y_{1} (θ)

and

W (θ) = X_{2} (θ) + i Y_{2} (θ)

, defining the scalar product as

〈 U, W 〉 = 〈 X_{1}, X_{2} 〉 + 〈 Y_{1}, Y_{2} 〉

, at each point

θ

. The corresponding square of the form will be

{| U |}^{2} = 〈 X_{1}, X_{1} 〉 + 〈 Y_{1}, Y_{1} 〉 = | X_{1} |^{2} + {| Y_{2} |}^{2}

.

The differentiation of vector fields involves the choice of a connection, i.e., a rule which associates to each point

θ \in Θ

a tangent vector

ξ \in T_{θ} Θ

and a smooth tensor field

V

defined at least in a neighborhood of

θ

a vector

\nabla_{ξ} V \in T_{θ} Θ

, such that

\nabla_{ξ} (V + W) = \nabla_{ξ} V + \nabla_{ξ} W

and

\nabla_{ξ} (q V) = (ξ q) V + q \nabla_{ξ} V

,

V, W

being smooth vector fields and q being a smooth function in the neighborhood of

θ

. It is also required that

\nabla_{ξ} V

be linear in

ξ

and that

\nabla_{W} V

be a smooth vector field. The vector

\nabla_{ξ} V

is commonly called the covariant derivative of

V

with respect to

ξ

. At this point, let us recall that we can define the divergence of a smooth vector field

W

as the smooth real-valued function defined, at

θ

, as

d i v (W) (θ) = t r a c e (ξ \mapsto \nabla_{ξ} W)

, and the Laplacian of a real smooth function q as

▵ q = d i v (grad (q))

; for details and properties see [21]. If we add the requirements that

\nabla_{V} W - \nabla_{W} V = [V, W]

where

[V, W]

is the Lie bracket of both vector fields,

V

and

W

, which is another vector field such that their action, over a smooth function q, is given by

{[V, W]}_{θ} q = V_{θ} (W q) - W_{θ} (V q)

and, additionally,

ξ 〈 V, W 〉 = 〈 \nabla_{ξ} V, W 〉 + 〈 V, \nabla_{ξ} W 〉

, we obtain the Levi–Civita connection, which is uniquely determined through the Christoffel symbols of the second kind. These are defined, again using repeated index summation convention, in terms of the metric tensor as

Γ_{ν λ}^{μ} = \frac{1}{2} g^{μ α} (\frac{\partial g_{α λ}}{\partial θ^{ν}} + \frac{\partial g_{α ν}}{\partial θ^{λ}} - \frac{\partial g_{ν λ}}{\partial θ^{α}}),

(A3)

symbols which encode how the basis vectors change from point to point due to curvature, which is properly quantified through several objects, such as the curvature operator

R (X, Y) Z = \nabla_{Y} \nabla_{X} Z - \nabla_{X} \nabla_{Y} Z - \nabla_{[Y, X]} Z,

(A4)

for vector fields

X

,

Y

, and

Z

in

Θ

and, also, the Riemann–Christoffel tensor defined as

K (W, Z, X, Y) = 〈 W, R (X, Y) Z 〉,

(A5)

with

W

being another vector field in

Θ

, the Riemannian sectional curvatures corresponding to the linearly independent vector fields

X

and

Y

being defined as

K (X, Y) = \frac{K (X, Y, X, Y)}{〈 X, X 〉 〈 Y, Y 〉 - {〈 X, Y 〉}^{2}},

(A6)

and the Ricci tensor being defined as

Ric (X, Y) = trace (Z \mapsto R (X, Z) Y)

. Observe that, if

ξ = ξ^{1} \partial / \partial θ^{1} + \dots + ξ^{m} \partial / \partial θ^{m}

and

V = V^{1} \partial / \partial θ^{1} + \dots + V^{m} \partial / \partial θ^{m}

, if we define

V_{, ν}^{μ} = \frac{\partial V^{μ}}{\partial θ^{ν}} + Γ_{ν λ}^{μ} V^{λ}

, we find that the components of the vector field

\nabla_{ξ} V

are given by

\nabla_{ξ} V = (\frac{\partial V^{μ}}{\partial θ^{ν}} + Γ_{ν λ}^{μ} V^{λ}) ξ^{ν} \frac{\partial}{\partial θ^{μ}} = V_{, ν}^{μ} ξ^{ν} \frac{\partial}{\partial θ^{μ}},

(A7)

where

\frac{\partial V^{μ}}{\partial θ^{ν}}

is the ordinary partial derivative. This formula illustrates how the covariant derivative modifies the ordinary derivative by incorporating terms that account for the twisting and turning of the coordinate system in curved space. The covariant derivative concept is a powerful generalization of ordinary derivatives, extending the concept from flat, Euclidean spaces to the more complex realm of curved manifolds. This extension is crucial for understanding how geometric properties evolve in curved spaces. The covariant derivative incorporates additional terms that account for the curvature of the manifold, effectively correcting for the way basis vectors change as one moves from point to point. This enables us to analyze and describe the behavior of vector fields and other geometric objects in a way that respects the underlying curvature.

It is convenient to generalize the covariant derivative to any tensor field. For this purpose, given

X \in T Θ

and q, a real smooth function on an open subset of

Θ

we denote by

\nabla_{X} q = X q

, and if

w \in T^{*} Θ

and

Y \in T Θ

, then define

(\nabla_{X} w) (Y) = X (w (Y)) - w (\nabla_{X} Y)

. Moreover, given an

(r, s)

-tensor field

A

, the covariant derivative may be extended to obtain an

(r, s + 1)

-tensor field. If

w_{1}, \dots, w_{r} \in T^{*} Θ

and

Y_{1}, \dots, Y_{s} \in T Θ

, then

\begin{matrix} \nabla_{X} A (w_{1}, \dots, w_{r}, Y_{1}, \dots, Y_{s}) & = (\nabla_{X} A) (w_{1}, \dots, w_{1}, Y_{1}, \dots, Y_{s}) \\ + A (\nabla_{X} w_{1}, w_{2}, \dots, w_{r}, Y_{1}, \dots, Y_{s}) + \dots \\ \dots + A (w_{1}, \dots, w_{1}, Y_{1}, \dots, Y_{s - 1}, \nabla_{X} Y_{s}), \end{matrix}

(A8)

and, finally,

\nabla A (w_{1}, \dots, w_{r}, Y_{1}, \dots, Y_{s}, X) = (\nabla_{X} A) (w_{1}, \dots, w_{1}, Y_{1}, \dots, Y_{s}) .

(A9)

Given a connection, the corresponding geodesics, the generalization of straight lines, are the curves whose tangent vector field, T, does not change, i.e.,

\nabla_{T} T = 0

. In components, under a coordinate system

θ^{1}, \dots, θ^{m}

, if we denote, with certain overuse of notation, by

θ (t)

the coordinates of a geodesic, we shall have, using the summation convention of repeated indices,

\frac{d^{2} θ^{k}}{d t^{2}} + Γ_{i j}^{k} \frac{d θ^{i}}{d t} \frac{d θ^{j}}{d t} = 0, k = 1, \dots, m,

(A10)

In the Riemannian case, with the Levi–Civita connection, the geodesics are also, locally, the minimum-length curves.

Next, we review the definition of the exponential map corresponding to a connection that is defined through geodesics as follows. Let

θ

be a point of the manifold

Θ

,

θ \in Θ

,

Θ_{θ}

be the tangent space in

θ

, and let

γ_{ξ} : [0, 1] \to Θ

be a geodesic such that

γ_{ξ} (0) = θ, θ \in Θ and {\frac{d γ}{d t}|}_{t = 0} = ξ .

(A11)

Then, the exponential map is given by

\exp_{θ} (ξ) = γ_{ξ} (1)

, and it is well defined for all

ξ

in an open star-shaped neighborhood of

0_{θ} \in T_{θ} Θ

.

Hereafter, we will consider the Riemannian case with the Levi–Civita connection. We now define

S_{θ} \subset T_{θ} Θ

as

S_{θ} = {ξ \in T_{θ} {Θ : ∥ ξ ∥}_{θ} = 1}

, and for each

ξ \in S_{θ}

, we let

c_{θ} (ξ) = sup {t > 0 : ρ (θ, γ_{ξ} (t)) = t}

, where

ρ

is the Riemannian distance, and

γ_{ξ}

is a geodesic defined in an open interval containing zero, such that

γ_{ξ} (0) = θ

with a tangent vector equal to

ξ

at the origin. Then, if we set

D_{θ} = {t ξ \in T_{θ} Θ : 0 \leq t < c_{θ} (ξ); ξ \in S_{θ}} and D_{θ} = \exp_{θ} (D_{θ}),

(A12)

it is well known that

\exp_{θ}

maps

D_{θ}

diffeomorphically onto

D_{θ}

. If the manifold is also complete, then the boundary of

D_{θ}

,

\partial D_{θ}

is mapped by the exponential map onto

\partial D_{θ}

, called the cut locus of θ in Θ, which in this case has a zero Riemannian measure. Moreover, if the manifold is simply connected, and the Riemannian curvature is non-positive or positive but with a sufficiently small diameter, the cut locus is empty. Additionally, in this case, the inverse of the exponential map is considered a map between two metric spaces:

Θ

, a Riemannian manifold, and

T_{θ} Θ

, the tangent space with Euclidean structure, which preserves the distance between any point to

θ

, although it does not preserve distances between arbitrary points in general. For additional details, see [9,21,22].

Figure A1. The figure illustrates the exponential map corresponding to the Levi–Civita connection in a Riemannian manifold. The lengths of the vectors in

ξ \in D_{θ}

in the tangent space are equal to the Riemannian distance

ρ (C, γ_{ξ} (1))

. This map preserves the radial distances but not the distances in general; the norm

| ξ - ν |

is not equal to the Riemannian distance

ρ (γ_{ξ} (1), γ_{ν} (1)

).

Figure A1. The figure illustrates the exponential map corresponding to the Levi–Civita connection in a Riemannian manifold. The lengths of the vectors in

ξ \in D_{θ}

in the tangent space are equal to the Riemannian distance

ρ (C, γ_{ξ} (1))

. This map preserves the radial distances but not the distances in general; the norm

| ξ - ν |

is not equal to the Riemannian distance

ρ (γ_{ξ} (1), γ_{ν} (1)

).

We now briefly review the concept of Jacobi field along a geodesic. Consider a geodesic

c (s)

in

Θ

and

c^{'} = c_{*} (\partial / \partial s)

, the tangent vector field over

c (s)

. A Jacobi field

Y

along

c (s)

is a field which satisfies the Jacobi equation

\frac{\nabla^{2}}{d s^{2}} Y + R (c^{'}, Y) c^{'} = 0,

(A13)

where

\frac{\nabla}{\partial s}

is the covariant derivative in the direction given by

c^{'}

along c.

Appendix A.2. Proof of Formula (65)

Let

B (γ, R)

be a geodesically convex open ball of radius

R > 0

around

γ \in Θ

, where

Θ

is a complete Riemannian manifold. Let

σ : [0, 1] ⟶ B (γ, R)

be a curve. Let us define

c_{γ} (s, t) = \exp_{γ} (s \exp_{γ}^{- 1} (σ (t)))

and

c_{γ}^{'} \equiv {(c_{γ})}_{*} (\frac{\partial}{\partial s}) {\dot{c}}_{γ} \equiv {(c_{γ})}_{*} (\frac{\partial}{\partial t}),

(A14)

where

{(c_{γ})}_{*}

is the differential of

c_{γ}

, and

\partial / \partial s

and

\partial / \partial t

are the ordinary partial derivative operators on

ℛ^{2}

; observe that

{\dot{c}}_{γ}

is a Jacobi field along the geodesic

s \mapsto c_{γ} (s, t)

, with

{\dot{c}}_{γ} (0, t) = 0

. On the other hand, it is well known, see [9], that for fixed t,

{(\exp_{γ})}_{* | s \exp_{γ}^{- 1} (σ (t))} (T_{s \exp_{γ}^{- 1} (σ (t))} η) = \frac{1}{s} Y (s),

(A15)

where

T_{ξ}

denotes the natural identification of

Θ_{γ}

with

{(Θ_{γ})}_{ξ}

(the tangent of the tangent space), and Y is a Jacobi field along

s \mapsto c_{γ} (s, t)

, with

Y (0) = 0

and

\nabla Y / \partial s = η

.

Let us define

ζ (s, t) = s \exp_{γ}^{- 1} (σ (t))

, then,

ζ_{*} (\frac{\partial}{\partial t}) (1, t) = ({(\exp_{γ}^{- 1})}_{* | σ (t)} \circ σ_{*}) (\frac{\partial}{\partial t}) = {(\exp_{γ}^{- 1})}_{* | σ (t)} (σ_{*} (\frac{\partial}{\partial t})) .

(A16)

Therefore,

\begin{matrix} {(\exp_{γ})}_{* | ζ (1, t)} (ζ_{*} (\frac{\partial}{\partial t}) (1, t)) = {(\exp_{γ})}_{* | ζ (1, t)} ({(\exp_{γ}^{- 1})}_{* | σ (t)} (σ_{*} (\frac{\partial}{\partial t}))) = \\ = ({(\exp_{γ})}_{* | ζ (1, t)} \circ {(\exp_{γ}^{- 1})}_{* | σ (t)}) (σ_{*} (\frac{\partial}{\partial t})) = σ_{*} (\frac{\partial}{\partial t}) = {\dot{c}}_{γ} (1, t), \end{matrix}

(A17)

and simultaneously,

{(\exp_{γ})}_{* | ζ (1, t)} (T_{ζ (1, t)} η) = {\dot{c}}_{γ} (1, t),

(A18)

with

\frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t) = η

. Thus we obtain

T_{ζ (1, t)} (\frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t)) = ζ_{*} (\frac{\partial}{\partial t}) (1, t),

(A19)

and, identifying the scalar product in

{(Θ_{γ})}_{ζ (1, t)}

with its corresponding in

Θ_{γ}

,

∥ ζ_{*} (\frac{\partial}{\partial t}) (1, t) ∥_{γ} = ∥ \frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t) ∥_{γ} .

(A20)

Therefore the length of

\exp_{γ}^{- 1} (σ (t))

from

t = 0

to

t = 1

will be given by

l_{E} = \int_{0}^{1} ∥ ζ_{*} (\frac{\partial}{\partial t}) (1, t) ∥_{γ} d t = \int_{0}^{1} ∥ \frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t) ∥_{γ} d t .

(A21)

Additionally, from the geometric Rauch theorem, see [9], we have

\frac{∥ {\dot{c}}_{γ}^{n o r} {(1, t) ∥}_{σ (t)}}{S_{δ} (ρ_{t})} ρ_{t} \leq ∥ \frac{\nabla {\dot{c}}_{γ}^{n o r}}{\partial s} (0, t) ∥_{γ} \leq \frac{∥ {\dot{c}}_{γ}^{n o r} {(1, t) ∥}_{σ (t)}}{S_{Δ} (ρ_{t})} ρ_{t},

(A22)

as long as

S_{δ}

and

S_{Δ}

do not vanish, where

S_{K} (t) = \{\begin{matrix} \frac{sin (\sqrt{K} t)}{\sqrt{K}} & if K > 0, \\ t & if K = 0, \\ \frac{sinh (\sqrt{- K} t)}{\sqrt{- K}} & if K < 0, \end{matrix}

(A23)

ρ_{t} = {∥ \exp_{γ}^{- 1} (σ (t)) ∥}_{γ}

,

{\dot{c}}_{γ}^{n o r}

is the normal component of the Jacobi field, with respect to

c_{γ}^{'}

and

δ \leq K (X, Y) \leq Δ for any X, Y \in T B (m, R),

(A24)

with K being the sectional Riemannian curvature, and

T B (m, R)

being the tangent Riemannian ball bundle. Moreover, taking into account the properties of the tangential Jacobi fields, we have

∥ \frac{\nabla {\dot{c}}_{γ}^{t a n}}{\partial s} (0, t) ∥_{γ} = {∥ {\dot{c}}_{γ}^{t a n} (1, t) ∥}_{σ (t)},

(A25)

where

{\dot{c}}_{γ}^{t a n}

is the tangential component of

{\dot{c}}_{γ}

and

∥ \frac{\nabla {\dot{c}}_{γ}}{\partial s} ∥_{c (s, t)}^{2} = ∥ \frac{\nabla {\dot{c}}_{γ}^{n o r}}{\partial s} ∥_{c (s, t)}^{2} + ∥ \frac{\nabla {\dot{c}}_{γ}^{t a n}}{\partial s} ∥_{c (s, t)}^{2},

(A26)

and

∥ {\dot{c}}_{γ} ∥_{c (s, t)}^{2} = ∥ {\dot{c}}_{γ}^{n o r} ∥_{c (s, t)}^{2} + ∥ {\dot{c}}_{γ}^{t a n} ∥_{c (s, t)}^{2},

(A27)

we obtain

∥ \frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t) ∥_{γ}^{2} \leq ∥ {\dot{c}}_{γ} (1, t) ∥_{σ (t)}^{2} + ∥ {\dot{c}}_{γ}^{n o r} (1, t) ∥_{σ (t)}^{2} (\frac{ρ_{t}^{2}}{S_{Δ}^{2} (ρ_{t})} - 1),

(A28)

and

∥ \frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t) ∥_{γ}^{2} \geq ∥ {\dot{c}}_{γ} (1, t) ∥_{σ (t)}^{2} + ∥ {\dot{c}}_{γ}^{n o r} (1, t) ∥_{σ (t)}^{2} (\frac{ρ_{t}^{2}}{S_{δ}^{2} (ρ_{t})} - 1) .

(A29)

Then, taking into account that

S_{K}^{2} (t) = \sum_{j = 0}^{\infty} \frac{{(- K)}^{j}}{(2 j + 1)!} t^{2 j + 1},

(A30)

we have

h_{K} (t) \equiv \frac{t^{2}}{S_{K}^{2} (t)} - 1 = \frac{1}{3} K t^{2} {1 + O (t^{2})},

(A31)

with

h_{K} (t)

being a strictly positive increasing function if

K > 0

(as long as

S_{K} (t)

does not vanish) and a decreasing function such that

- 1 < h_{K} (t) < 0

if

K < 0

. Therefore,

∥ \frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t) ∥_{γ}^{2} \leq (1 + max (0, h_{Δ} (R))) ∥ {\dot{c}}_{γ} (1, t) ∥_{σ (t)}^{2},

(A32)

and

∥ \frac{\nabla {\dot{c}}_{γ}}{\partial s} (0, t) ∥_{γ}^{2} \geq (1 + min (0, h_{δ} (R))) ∥ {\dot{c}}_{γ} (1, t) ∥_{σ (t)}^{2},

(A33)

since

ρ_{t} \leq R

. On the other hand, the Riemannian length of

σ

is given by

l_{R} = \int_{0}^{1} ∥ {\dot{c}}_{m} (1, t) ∥_{σ (t)} d t .

(A34)

Therefore,

\sqrt{(1 + min (0, h_{δ} (R)))} l_{R} \leq l_{E} \leq \sqrt{(1 + min (0, h_{Δ} (R)))} l_{R} .

(A35)

Let us consider now

θ_{α}, θ_{β} \in B (γ, R / 2)

. In this case, the shortest geodesic that joins

θ_{α}

and

θ_{β}

lies in

B (γ, R)

. Let

σ (t)

be this geodesic with

σ (0) = θ_{α}

,

σ (1) = θ_{β}

, and let us define

ρ \equiv l_{R} = ρ (θ_{α}, θ_{β})

, i.e., the Riemannian distance between

θ_{α}

and

θ_{β}

. Then, we have

d = ∥ \exp_{γ}^{- 1} (θ_{α}) - \exp_{γ}^{- 1} (θ_{β}) ∥_{γ} \leq l_{E} \leq \sqrt{(1 + min (0, h_{Δ} (R)))} ρ .

(A36)

On the other hand, if

θ_{α}, θ_{β} \in B (γ, R)

, and if

r (t)

is the straight line in

Θ_{γ}

such that

r (0) = \exp_{γ}^{- 1} (θ_{α})

and

r (1) = \exp_{γ}^{- 1} (θ_{β})

, then

σ (t) = \exp_{γ} (r (t)) \in B (γ, R)

with

σ (0) = θ_{α}

,

σ (1) = θ_{β}

. Thus, we have

d \geq \sqrt{(1 + min (0, h_{δ} (R)))} l_{R} \geq \sqrt{(1 + min (0, h_{δ} (R)))} ρ,

(A37)

since

l_{R} \geq ρ \equiv ρ (θ_{α}, θ_{β})

.

Thus, combining (A36) and (A37), we have

min (0, h_{δ} (R)) \leq \frac{d^{2} - ρ^{2}}{ρ^{2}} \leq max (0, h_{Δ} (R)),

(A38)

and defining

D = max (| δ |, | Δ |)

, we obtain

\frac{| d^{2} - ρ^{2} |}{ρ^{2}} \leq h_{D} (R),

(A39)

and therefore (65).

Appendix A.3. Additional Remarks

As a reminder, it has been proven that well-known indices, such as the Kullback–Leibler divergence, locally induce, up to a proportionality constant, the information metric on the parameter space of the statistical model. Further characterizations of the information metric, in terms of invariance under Markov kernels’ transformations, are given in [23]. With the Riemannian distance, say

ρ

, such that

ρ \geq D

, the manifold

Θ

or

ϕ (Θ)

becomes a length space, unlike these manifolds considering the Hilbert metric structure. The information metric has been studied for several parametric regular statistical models; for instance, see [11,24]. Moreover, though the information metric is possible to develop, in a natural way, an intrinsic approach to statistical estimation, invariant under reparametrizations, see [25].

Concerning the Hermitian kernel notion, observe that the Hermitian property

K (θ, γ) = \bar{K (γ, θ)}

is a consequence of the positive-definiteness, and it is not necessary to make it explicit in the definition. If, for mutually distinct

θ_{1}, \dots, θ_{n}

, the equality only holds for

z_{1}, = \dots = z_{n} = 0

, it is often said that K is a strictly positive definite Hermitian kernel. It is also possible to consider real-valued kernels satisfying

K : Θ \times Θ \to R

such that

K (θ, γ) = K (γ, θ)

and

\sum_{i, j = 1}^{n} α_{i} α_{j} K (θ_{i}, θ_{j}) \geq 0

for any positive integer

n \in N

,

α_{i} \in R

,

θ_{i} \in Θ

i = 1, \dots, n

.

Moreover, observe that the natural kernel suggested, corresponding to a regular parametric family, satisfies

K (θ, γ) = \bar{K (γ, θ)}

, since clearly

\bar{\int_{Θ} q (ω) μ (d ω)} = \int_{Θ} \bar{q (ω)} μ (d ω)

, and for any scalars

z_{1}, \dots, z_{n} \in C

, where n is an arbitrary positive integer,

\sum_{i, j = 1}^{n} z_{i} \bar{z_{j}} K (θ_{i}, θ_{j}) = κ^{2} \sum_{i, j = 1}^{n} z_{i} \bar{z_{j}} \int_{Θ} h (ω, θ_{i}) \bar{h (ω, θ_{j})} μ (d ω) = κ^{2} \int_{Θ} (\sum_{i = 1}^{n} z_{i} h (ω, θ_{i}))

\bar{(\sum_{j = 1}^{n} z_{j} h (ω, θ_{j}))} μ (d ω) \geq 0

; therefore,

K (θ, γ)

is a complex-valued Hermitian kernel.

With additional assumptions about the parameter space from measure and topological theory, many other properties of kernels can be established, such as Mercer’s theorem and relationships between square-integrable function spaces and RKHS via linear operators. Additionally, to clarify the relationship between the fundamental tensor and the kernel, we can utilize several results made explicit in Chapter 4 of [13].

References

Frieden, B. Science from Fisher Information: A Unification, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
Absil, P.A.; Mahony, R.; Sepulchre, R. Optimization Algorithms on Matrix Manifolds; Princeton University Press: Princeton, NJ, USA, USA, 2007. [Google Scholar]
Amari, S.i. Information Geometry and Its Applications, 1st ed.; Springer Publishing Company, Incorporated: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Bernal-Casas, D.; Oller, J.M. Variational Information Principles to Unveil Physical Laws. Mathematics 2024, 12, 3941. [Google Scholar] [CrossRef]
Chalmers, D. Constructing the World; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
Lang, S. Fundamentals of Differential Geometry; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 191. [Google Scholar]
Burbea, J.; Rao, C.R. Entropy differential metric, distance and divergence measures in probability spaces: A unified approach. J. Multivar. Anal. 1982, 12, 575–596. [Google Scholar] [CrossRef]
Kobayashi, S.; Nomizu, K. Foundations of Differential Geometry; John Wiley & Sons: Hoboken, NJ, USA, 1963; Volumes 1 and 2. [Google Scholar]
Chavel, I. Riemannian Geometry: A Modern Introduction; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
Rao, C. Information and Accuracy Attainable in Estimation of Statistical Parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
Atkinson, C.; Mitchell, A.F.S. Rao’s Distance Measure. Sankhyā Indian J. Stat. Ser. A 1981, 43, 345–365. [Google Scholar]
Aronszajn, N. Theory of Reproducing Kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
Steinwart, I.; Christmann, A. Support Vector Machines; Springer Science & Business Media: New York, NY, USA, 2008. [Google Scholar]
Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
Coddington, E.E.; Levinson, N. Theory of Ordinary Differential Equations; McGraw-Hill: New York, NY, USA, 1955; Reprinted in December 1984. [Google Scholar]
Reverter, F.; Vegas, E.; Oller, J. Kernel-PCA data integration with enhanced interpretability. BMC Syst. Biol. 2014, 8, S6. [Google Scholar] [CrossRef] [PubMed]
Karcher, H. Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 1977, 30, 509–541. [Google Scholar] [CrossRef]
Berger, J.O. Statistical Decision Theory and Bayesian Analysis; Springer: New York, NY, USA, 1985. [Google Scholar]
Heyer, H. Theory of Statistical Experiments; Springer: New York, NY, USA, 1983. [Google Scholar]
Strasser, H. Mathematical Theory of Statistics: Statistical Experiments and Asymptotic Decision Theory; Walter de Gruyter: Berlin, Germany, 1985. [Google Scholar]
Chavel, I. Eigenvalues in Riemannian Geometry; Elsevier: Philadelphia, PA, USA, 1984. [Google Scholar] [CrossRef]
Spivak, M. A Comprehensive Introduction to Differential Geometry; Publish or Perish: Berkeley, CA, USA, 1970; Volumes 1–5. [Google Scholar]
Chentsov, N.N. Statistical Decision Rules and Optimal Inference; Translations of Mathematical Monographs; Translated from the Russian by the Israel Program for Scientific Translations; American Mathematical Society: Providence, RI, USA, 1982; Volume 53. [Google Scholar] [CrossRef]
Calvo, M.; Oller, J. An explicit solution of information geodesic equations for the multivariate normal model. Stat. Decis. 1991, 9, 119–138. [Google Scholar] [CrossRef]
Oller, J.M.; Corcuera, J.M. Intrinsic Analysis of Statistical Estimation. Ann. Stat. 1995, 23, 1562–1581. [Google Scholar] [CrossRef]

Figure 1. In the parameter space

Θ

, a manifold, for example an open set of

R^{m}

, is mapped into

H_{K}

, which also acts as a feature space of the natural kernel (25) defined at initio through

L^{2} (Θ, μ)

, via the canonical feature map

ϕ

.

H_{K}

is a Hilbert space of functions that is identifiable with a closed subspace of

L^{2} (Θ, μ)

and satisfies the reproducing property, with the point evaluation functional being continuous.

Figure 1. In the parameter space

Θ

, a manifold, for example an open set of

R^{m}

, is mapped into

H_{K}

, which also acts as a feature space of the natural kernel (25) defined at initio through

L^{2} (Θ, μ)

, via the canonical feature map

ϕ

.

H_{K}

is a Hilbert space of functions that is identifiable with a closed subspace of

L^{2} (Θ, μ)

and satisfies the reproducing property, with the point evaluation functional being continuous.

Figure 2. A bundle of smooth, blue integral curves winds through the space, each representing a trajectory moving according to the underlying vector field. These curves emanate from various points, illustrating the flow lines that define the field’s behavior. Overlaid on the scene are red vector fields, with arrows indicating the direction and magnitude of the vectors at each point. The interplay between the flowing blue curves and the red vectors visualizes the dynamic structure and flow patterns of the vector field.

Figure 3. This illustration visualizes how the points a, b, and c are mapped onto the feature space, facilitating analysis within the RKHS framework. It also depicts the gradient fields in the feature space.

Figure 4. The inverse of the exponential map illustrated here operates on the set

Θ ∖ {C^{'}}

, where

C^{'}

represents the antipodal point of

C

on the manifold. This inverse mapping effectively retrieves the tangent vector at

C

that, when exponentiated, reaches a given point in the domain, excluding the antipode

C^{'}

to ensure the map’s well-definedness and invertibility.

Figure 4. The inverse of the exponential map illustrated here operates on the set

Θ ∖ {C^{'}}

, where

C^{'}

represents the antipodal point of

C

on the manifold. This inverse mapping effectively retrieves the tangent vector at

C

that, when exponentiated, reaches a given point in the domain, excluding the antipode

C^{'}

to ensure the map’s well-definedness and invertibility.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bernal-Casas, D.; Oller, J.M. Information-Geometric Models in Data Analysis and Physics. Mathematics 2025, 13, 3114. https://doi.org/10.3390/math13193114

AMA Style

Bernal-Casas D, Oller JM. Information-Geometric Models in Data Analysis and Physics. Mathematics. 2025; 13(19):3114. https://doi.org/10.3390/math13193114

Chicago/Turabian Style

Bernal-Casas, D., and José M. Oller. 2025. "Information-Geometric Models in Data Analysis and Physics" Mathematics 13, no. 19: 3114. https://doi.org/10.3390/math13193114

APA Style

Bernal-Casas, D., & Oller, J. M. (2025). Information-Geometric Models in Data Analysis and Physics. Mathematics, 13(19), 3114. https://doi.org/10.3390/math13193114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information-Geometric Models in Data Analysis and Physics

Abstract

1. Introduction

2. Materials and Methods

2.1. The Information Sources and the Parameter Space

2.2. The Riemannian Structure in the Parameter Space

2.3. Kernel Methods

2.3.1. The Case of the Gradient Operator

2.3.2. Projecting onto an Affine Manifold

2.4. Data Analysis Applications: Mapping Objects in a Low-Dimensional Space

2.4.1. Working in the Feature Space

2.4.2. PCA in a Tangent Space

2.5. Physical Applications

The Variational Principle

3. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Differential Geometry Remarks

Appendix A.2. Proof of Formula (65)

Appendix A.3. Additional Remarks

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI