2.1. Mathematical Framework
Assume the following framework: Let denote the sample space of the problem under consideration, let represent the –algebra of subsets of , and let be a positive –finite measure defined on the measurable space . In this paper, we define a parametric family of functions as the triple , where is a measure space; is a manifold, also referred to has the parameter space; and is a measurable mapping such that is a probability density, meaning that defines a probability measure on for every . Here, is the reference measure and f the model function.
For simplicity, we assume that
is an
m-dimensional
real manifold, Hausdorff, connected, and possibly has the boundary
, while noting that infinite-dimensional Hilbert or Banach manifolds could also be treated within this framework. In many cases, it suffices to consider
as a connected open subset of
, using the same symbol
for both points in
and their coordinates. We adopt this convention for clarity, while noting that the results extend to more general settings. The model function
f is assumed to satisfy the minimal regularity conditions required for the Fisher information matrix to exist, avoiding stronger assumptions. Thus, we work within the essential framework of information geometry, where
is viewed as a Riemannian manifold with metric tensor given by its covariant components.
where
is the random variable, the distribution of which is given by the probability
. The expectation in (
1) is obtained by integrating the products of first-order partial derivatives with respect to the underlying probability measure. If
denotes the
Fisher information matrix and
, then the Riemannian volume element is
. For background, see Rao’s seminal work [
10] and further developments in [
11,
12,
13,
14], among others.
We now define the information carried by the data
relative to the true parameter
. The log-likelihood
is a natural candidate. Still, it is not invariant under injective transformations of the data (e.g., rescaling). To restore invariance, one may fix a reference point
and define the information in
relative to
and refer to
, as
The dependence of (
2) on
is omitted, as it is irrelevant for computing gradients on the parameter manifold. For fixed
, (
2) is invariant under both admissible data transformations and coordinate changes on
, and thus constitutes a scalar field on the parameter space.
Information from an external source is internalized by the observer, enabling them to assess the objects of interest. We focus on two aspects: the parameter space with its natural information geometry, and the observer’s construction of a plausibility function for the true parameter value in given a sample . This plausibility is a measurable map such that , up to a non-negative normalization, defines a subjective conditional probability density with respect to the Riemannian volume induced by the information metric on , with denoting the complex conjugate.
Specifically, after
is given, once normalized, we will assume that
will be a probability density with respect to the Riemannian volume, with support in a set
enclosed in
, and thus we shall write at the beginning,
For reasons that will become apparent, we shall focus primarily on the case
, since in this case
will be a probability density, with respect to the Riemannian volume, concentrated in the closure of the set where, for a given
, the likelihood is strictly positive. In (
3), integration is with respect to the Riemannian measure (
1), ensuring invariance under coordinate changes.
If we define a probability on the parameter manifold, representing the propensity of a parameter to be true as in Bayesian statistics, the normalized function
can be taken as its Radon–Nikodym derivative with respect to the Riemannian volume, which is itself a positive measure on
(see [
15]). Both measures are coordinate-independent, so
is an invariant scalar field on
. We may then
define the
information encoded by the
subjective plausibility
,
relative to the
true parameter
, as
The quantity (
4) remains invariant under coordinate changes on the parameter manifold, and thus constitutes a scalar field on
.
Note, at this point, that there will be infinite ways of constructing, given
, the aforementioned plausibility
. It will be in
Section 2.2 where we will present some variational procedures to obtain the cited plausibility.
2.2. An Extended Variational Principle
Assuming that the observer’s abilities have been shaped by natural selection, we can posit that subjective information adapts to the source information and, in particular, satisfies the following variational principle:
it is a
minimum, or at least stationary, subject to the constraint (
3) for a suitable
, assuming
is constant in
(hence on
), or vanishes at infinity with
on
(and thus on
) or at infinity. These conditions ensure that the true parameter
is within
. The functional
is, up to normalization, the expected value of a norm power corresponding to the Riemannian metric in (
1). Consider the differences between the gradients
and
,
, with expectation taken with respect to the probability on
given by the density
and the Riemannian volume
. Equation (
5) is invariant under coordinate changes, since both the squared norm and
are invariant. The source is treated as objective (or intersubjective), while the parameter space, with its geometric structure, is in part observer dependent, although strongly constrained by the source.
Any change in the information encoded by , due to a modification in the source in the parameter space, should correspond to a change in the subjective information of the observer. Consequently, the squared difference between and , divided by the sample size n, should be, on average, locally minimized, that is, should be as small as possible.
2.3. Solving the Extended Variational Problem
Because of (
5), they are a class of optimization problems with at least the constraint (
3), so we may introduce the augmented Lagrangian
with
being a Lagrange multiplier. Notice also that
, an expression implicitly dependent on
x and invariant under coordinate changes on
. Let
be an arbitrary smooth complex-valued function, and assume
satisfies (
3). Omitting explicit
-dependence for simplicity—writing
,
, and
instead of
,
, and
—we then have
If we write
, where
and
are the real and imaginary parts of
and may depend on
x, and also considering the real and imaginary parts of
,
and defining
, where
A is obviously a function of
and
, then if we define
, we will have
Since
, and taking into account that
, if we define
, a constant with respect to
but a function with respect to
and under the assumptions of sufficient smooth functions, we have
Therefore, taking into account
, we have
and
If we define
B as
and
, then we have
and
obtaining the first variation in the Lagrangian (
6) as
Taking into account that
, we have
Since
and
, we have
On the other hand, observe that by the Gauss divergence theorem, we have the following
where
is the unitary vector field on
pointing out
, and
is the surface element of the boundary induced by the Riemannian metric on
. Taking into account that, by the boundary conditions,
vanishes at
or at infinity, results in (
19). Then, since
where Δ is the Laplace operator notation, thus, we have,
However this first variation must be zero for arbitrary
and
. Therefore, we arrive at
which may be written as the fundamental equation
If we note
, then we have
and
, and therefore the fundamental Equation (
24) becomes colback=white, colframe=red
Equation (
25) yields the stationary points of the variational problem (
5) under the constraint (
3), which are not necessarily minimum points. Observe that (
25) is in fact an equation which only restricts the modulus of the solution
, but leaves the argument arbitrary, that is, the solutions desired by the variational problem (
5) are in the form
.
If we choose , then we obtain a direct probabilistic interpretation, that is, as a probability density with respect to the Riemannian volume. In this case, the equations obtained can be reinterpreted as a procedure for obtaining, from the available data, a Bayesian posterior distribution without the explicit intervention of a prior distribution.
2.4. The Model
The basic model that will be proposed will attempt to describe a situation in which we will have two approximately independent sources of information and several replicas of size and , respectively, not necessarily equal to each other, although and may be linked to each other as we will discuss later. Based on this basic model, we will assume that we have a random sample of size for both subsamples of sizes and ; that is, we will base our calculations on a total of independent data.
Specifically, let
be an
m-variate normal random vector with mean
, a known strictly positive definite covariance matrix
, and
,
, and we dispose
independent identically distributed copies of
, with joint absolutely continuous density equal to
We identify the elements of
with
m-column vectors as we need and we also define
,
. In (
26), det and Tr represent the determinant and trace matrix operators on the covariance matrix defined above.
Additionally, we will assume that we dispose of
identically distributed and independent copies of a random variable
T, with a joint absolutely continuous density equal to
where
is the ordered sample;
is the characteristic function of the interval
;
;
is a known positive constant; and
. Observe that although this model is not regular, it is still possible to define the information metric, as in [
16].
Combining (
26) and (
27) we obtain the basic model
The parameter space is the dimensional manifold , with points and basis vector field . For clarity, some results will be stated as propositions, even when elementary.
Proposition 1. The Fisher information matrix G of the model, in coordinates , iswhere is , is , and is a block matrix. All the elements of these block matrices depend on the coordinates mentioned above, and Fisher information G and its inverse are equal toand its determinant and the square root of the latter, as a consequence. Then we have Proof. The proof is just a computation. First, observe that, in matrix notation,
and therefore
where
, and it follows that
.
Since
and
are independent and
,
and since, with probability one, we have
thus,
, obtaining (
29). The inverse
, the determinant
, and its square root follow directly, completing the proof. □
Assume now that we dispose of a particular sample of
independent copies of
and
, that is,
and
for
, where
n,
, and
are positive integers. After some tedious but straightforward computation, the joint likelihood of the model will be
We regard elements of
as column vectors and set
,
,
, and
. We take the reference point
with the arbitrary and
. The Mahalanobis distance is then
(see [
17]).
Let us consider fixed sample values
and
and additionally suppose that we have strong reasons to believe that the density
is absolutely continuous with respect to the Riemann volume and is concentrated at
, where
is an arbitrarily large real number greater than
. Therefore, the
information,
, encoded by the data
, relative to the
true parameter
, referred to as a reference point
, with
, and defined as minus the logarithm of (
36) is
Then, we have the next proposition.
Proposition 2. The partial derivatives of defined in (37), using the summation convention of repeated indices and classical notation, where a comma preceding an index indicates the covariant derivative in the direction of the coordinate indicated by the index, are given bywhere we have defined the derivatives at as the right derivatives. The gradient of using matrix notation isand the square of its norm will be given by the scalar field Proof. Accounting for the symmetry of the covariance matrix and its inverse, with elements
(superscripts without tensorial meaning), we obtain (
38) and (39). For the gradient computation, from (
30) and using matrix notation, we have
obtaining (
40). The square of its norm is
obtaining (
41). □
Proposition 3. For an arbitrary function h, the Laplacian, using the repeated-index summation convention, if we denote , and , is In particular,where we have defined that the second derivatives at are its right second derivatives. Proof. With the above-mentioned notation, the Laplacian of
h is
obtaining (
44). Moreover, taking into account (
38) and (39), we have
Since by repeated-index summation convention,
, we shall have
obtaining (
45). □
2.7. Solving (50) and (51) for
For
, with
,
, and
, division by
yields from (
50)
which must be solved with the boundary conditions
. This is essentially Equation (
30) or (36) of [
8]. No nontrivial solutions exist unless
This fixes
,
, with
being the energy. For the ground state (
),
. Nontrivial solutions for each
are given by Hermite polynomials [
8]; wave functions for
are
and
respectively.
On the other hand, Equation (
51) results in
Taking into account that
for
, we obtain
and also, making explicit the energy
, instead of
, we shall have
Basically, we can re-express Equation (
59) as
. Provided that
, we obtain the general solution
and if
, then
To satisfy the boundary conditions, we must carefully choose the integration constants and the value of . Several cases can be considered; however, at this point, we only indicate that convenient values of can yield compatible solutions with the wave equation of a quantum harmonic oscillator.
For instance, consider the case when we are in the ground state of the quantum harmonic oscillator, which nullifies the term
. From Equation (
59), we obtain the following
This equation again represents an eigenvalue problem of the form , where E is the eigenvalue. If we set , we find that the energy corresponds to the intrinsic Cramér–Rao bound , where m is the number of parameters involved. In the exponential case, we have .