Next Article in Journal
A Review of Robotic Aircraft Skin Inspection: From Data Acquisition to Defect Analysis
Previous Article in Journal
Development of Dynamic System Applications Using Distributed Quantum-Centric Computing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Variational Informational Principles: A Complexified Approach with Arbitrary Order Norms

1
Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08018 Barcelona, Spain
2
Department of Genetics, Microbiology and Statistics, Faculty of Biology, Universitat de Barcelona, 08028 Barcelona, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(19), 3160; https://doi.org/10.3390/math13193160
Submission received: 1 August 2025 / Revised: 10 September 2025 / Accepted: 23 September 2025 / Published: 2 October 2025

Abstract

This paper offers an innovative exploration of variational informational principles by incorporating complexification and studying norms of arbitrary order, thereby surpassing the limitations of the conventional L 2 norm. For years, variational principles have been vital for deriving fundamental results in both physics and information theory; however, our proposed framework represents a significant advancement by utilizing complex variables to enhance our understanding of information measures. By employing complex numbers, we introduce a sophisticated structure that captures phase information, thereby significantly improving the potential applicability and scope of variational principles. The inclusion of norms of arbitrary order further expands the scope of optimization problems in information theory, leading to the potential for more creative solutions. Our findings indicate that this extended framework not only maintains the essential characteristics of traditional variational principles but also reveals valuable insights into the complex interplay between complexity, information, and optimization. We conclude with a thoughtful discussion of potential applications and future research directions, emphasizing the transformative impact that complexified variational principles, together with norms of arbitrary order, could have on the study of quantum dynamics.

1. Introduction

Variational principles have long been a cornerstone in the formulation and understanding of physical laws, providing a unifying framework that bridges classical mechanics, quantum theory, and information science. From the principle of least action in classical mechanics to the variational methods applied in quantum field theory, these principles facilitate the derivation of governing equations by optimizing specific functionals. In information theory, variational methods underpin the development of measures such as entropy and divergence, enabling a deeper understanding of information flow and complexity [1,2]. Despite their widespread success, traditional variational approaches often rely on quadratic norms, particularly the L 2 norm, which can limit their expressive power and the richness of the resulting insights.
Recent advances have sought to extend the scope of variational principles by exploring alternative mathematical structures and optimization criteria. Notably, the incorporation of complex variables has garnered significant interest within quantum physics [3]. Complexification offers a pathway to capture additional degrees of freedom, thereby enriching the descriptive capacity of variational formulations. Moreover, the generalization of norms to arbitrary orders, such as L p norms, has opened new avenues for tailoring optimization strategies to specific applications, particularly in the context of non-linear systems and information measures [4,5].
We have recently developed a variational principle based on information quantities to unveil the relevant physical laws [6]. This principle has enabled us to derive the Schrödinger equation, which has been utilized in previous studies to determine the stationary states precisely [7,8]. Additionally, it has allowed us to analyze the sample size [9]. The solution of the variational principle [6], being framed in terms of informational parameters rather than physical quantities, has significant implications. Schrödinger’s equation may thus be interpreted as a distinctive physical manifestation of a deeper informational formalism, potentially opening novel avenues for research.
Building upon these developments [6,7,8,9], this work introduces an innovative framework that integrates complex variables into variational informational principles while incorporating norms of arbitrary order. This approach transcends the limitations of conventional quadratic norms, allowing for a more nuanced examination of information measures and their underlying structures. By leveraging complexification, we effectively encode phase information, which is particularly relevant in quantum systems and wave phenomena, thereby providing a richer mathematical landscape for variational analysis. The flexibility introduced by arbitrary order norms further enhances the capacity to optimize diverse information functionals, potentially leading to novel solutions in complex physical and informational settings.
The significance of this extended variational framework lies in its potential to deepen our understanding of the intricate relationship between complexity, information, and physical laws. As quantum technologies continue to evolve, the ability to formulate and solve optimization problems within a complexified, multi-norm context could yield transformative insights into quantum dynamics, entanglement, and information processing. Furthermore, this approach paves the way for new methodologies in classical and quantum statistical mechanics, signal analysis, and machine learning, where non-quadratic measures and phase information are increasingly relevant.

2. Materials and Methods

This study focuses primarily on developing a new mathematical framework. We define the metric, identify the information and its sources, and outline the intricate structure of the parameter space in Section 2.1. Then, we examine the variational principle in Section 2.2, and solve the variational problem in Section 2.3. Section 2.4 illustrates the utility of the framework with a detailed model example.

2.1. Mathematical Framework

Assume the following framework: Let X denote the sample space of the problem under consideration, let A represent the σ –algebra of subsets of X , and let μ be a positive σ –finite measure defined on the measurable space ( X , A ) . In this paper, we define a parametric family of functions as the triple  { ( X , A , μ ) ; Θ ; f } , where ( X , A , μ ) is a measure space; Θ is a manifold, also referred to has the parameter space; and f : X × Θ R is a measurable mapping such that f ( x , θ ) is a probability density, meaning that P θ ( d x ) = f ( x , θ ) μ ( d x ) defines a probability measure on ( X , A ) for every θ Θ . Here, μ is the reference measure and f the model function.
For simplicity, we assume that Θ is an m-dimensional C real manifold, Hausdorff, connected, and possibly has the boundary Θ , while noting that infinite-dimensional Hilbert or Banach manifolds could also be treated within this framework. In many cases, it suffices to consider Θ as a connected open subset of R m , using the same symbol θ for both points in Θ and their coordinates. We adopt this convention for clarity, while noting that the results extend to more general settings. The model function f is assumed to satisfy the minimal regularity conditions required for the Fisher information matrix to exist, avoiding stronger assumptions. Thus, we work within the essential framework of information geometry, where Θ is viewed as a Riemannian manifold with metric tensor given by its covariant components.
g i j ( θ ) = E θ ln f ( X , θ ) θ i ln f ( X , θ ) θ j , i , j = 1 , , m ,
where X is the random variable, the distribution of which is given by the probability P θ ( d x ) = f ( x , θ ) μ ( d x ) . The expectation in (1) is obtained by integrating the products of first-order partial derivatives with respect to the underlying probability measure. If G ( θ ) = g i j ( θ ) denotes the m × m Fisher information matrix and g ( θ ) = | det G ( θ ) | , then the Riemannian volume element is V ( d θ ) = g ( θ ) d θ . For background, see Rao’s seminal work [10] and further developments in [11,12,13,14], among others.
We now define the information carried by the data x relative to the true parameter θ . The log-likelihood ln L x ( θ ) = i = 1 n ln f ( x i , θ ) is a natural candidate. Still, it is not invariant under injective transformations of the data (e.g., rescaling). To restore invariance, one may fix a reference point θ 0 Θ and define the information in x relative to θ and refer to θ 0 , as
I x ( θ ) = ln L x ( θ ) ( ln L x ( θ 0 ) ) = i = 1 n ln f ( x i , θ ) + i = 1 n ln f ( x i , θ 0 ) .
The dependence of (2) on θ 0 is omitted, as it is irrelevant for computing gradients on the parameter manifold. For fixed x , (2) is invariant under both admissible data transformations and coordinate changes on Θ , and thus constitutes a scalar field on the parameter space.
Information from an external source is internalized by the observer, enabling them to assess the objects of interest. We focus on two aspects: the parameter space with its natural information geometry, and the observer’s construction of a plausibility function Ψ for the true parameter value in Θ given a sample x . This plausibility is a measurable map Ψ : Θ C such that Ψ Ψ ¯ , up to a non-negative normalization, defines a subjective conditional probability density with respect to the Riemannian volume induced by the information metric on Θ , with Ψ ¯ denoting the complex conjugate.
Specifically, after x is given, once normalized, we will assume that Ψ Ψ ¯ will be a probability density with respect to the Riemannian volume, with support in a set Θ x enclosed in Θ , and thus we shall write at the beginning,
Θ Ψ ( θ ) Ψ ( θ ) ¯ V ( d θ ) = Θ x Ψ ( θ ) Ψ ( θ ) ¯ V ( d θ ) = a > 0 .
For reasons that will become apparent, we shall focus primarily on the case a = 1 , since in this case Ψ Ψ ¯ will be a probability density, with respect to the Riemannian volume, concentrated in the closure of the set where, for a given x , the likelihood is strictly positive. In (3), integration is with respect to the Riemannian measure (1), ensuring invariance under coordinate changes.
If we define a probability on the parameter manifold, representing the propensity of a parameter to be true as in Bayesian statistics, the normalized function Ψ Ψ ¯   Θ can be taken as its Radon–Nikodym derivative with respect to the Riemannian volume, which is itself a positive measure on Θ (see [15]). Both measures are coordinate-independent, so Ψ Ψ ¯ is an invariant scalar field on Θ . We may then define the information encoded by the subjective plausibility Ψ , relative to the true parameter θ , as
Λ ( θ ) = ln ( Ψ Ψ ¯ ) ( θ ) .
The quantity (4) remains invariant under coordinate changes on the parameter manifold, and thus constitutes a scalar field on Θ .
Note, at this point, that there will be infinite ways of constructing, given x , the aforementioned plausibility Ψ . It will be in Section 2.2 where we will present some variational procedures to obtain the cited plausibility.

2.2. An Extended Variational Principle

Assuming that the observer’s abilities have been shaped by natural selection, we can posit that subjective information adapts to the source information and, in particular, satisfies the following variational principle:
Ω α ( Ψ ) = Θ 1 n | grad I x ( θ ) grad Λ ( θ ) | α ( Ψ Ψ ¯ ) ( θ ) V ( d θ ) ,
it is a minimum, or at least stationary, subject to the constraint (3) for a suitable α > 0 , assuming Ψ is constant in Θ x (hence on Θ ), or vanishes at infinity with grad Ψ = 0 on Θ x (and thus on Θ ) or at infinity. These conditions ensure that the true parameter θ is within Θ x Θ . The functional Ω α is, up to normalization, the expected value of a norm power corresponding to the Riemannian metric in (1). Consider the differences between the gradients grad I x ( θ ) and grad Λ ( θ ) , | grad I x ( θ ) grad Λ ( θ ) | α , with expectation taken with respect to the probability on Θ given by the density Ψ Ψ ¯ and the Riemannian volume V ( d θ ) . Equation (5) is invariant under coordinate changes, since both the squared norm and ( Ψ Ψ ¯ ) ( θ ) V ( d θ ) = ( Ψ Ψ ¯ ) ( θ ) g ( θ ) d θ are invariant. The source is treated as objective (or intersubjective), while the parameter space, with its geometric structure, is in part observer dependent, although strongly constrained by the source.
Any change in the information encoded by x , due to a modification in the source in the parameter space, should correspond to a change in the subjective information of the observer. Consequently, the squared difference between grad I x and grad Λ , divided by the sample size n, should be, on average, locally minimized, that is, Ω α should be as small as possible.

2.3. Solving the Extended Variational Problem

Because of (5), they are a class of optimization problems with at least the constraint (3), so we may introduce the augmented Lagrangian
L λ , a , α ( Ψ ) = Θ 1 n | grad I x ( θ ) grad Λ ( θ ) | α ( Ψ Ψ ¯ ) ( θ ) V ( d θ ) λ Θ ( Ψ Ψ ¯ ) ( θ ) V ( d θ ) a ,
with λ being a Lagrange multiplier. Notice also that grad Λ = grad ( ln ( Ψ Ψ ¯ ) ) = 1 Ψ Ψ ¯ grad ( Ψ Ψ ¯ ) , an expression implicitly dependent on x and invariant under coordinate changes on Θ . Let η ( θ ) be an arbitrary smooth complex-valued function, and assume Ψ + ϵ η satisfies (3). Omitting explicit θ -dependence for simplicity—writing I x , Ψ , and η instead of I x ( θ ) , Ψ ( θ ) , and η ( θ ) —we then have
L λ , a , α ( Ψ + ϵ η ) = Θ 1 n | 1 ( Ψ + ϵ η ) ( Ψ + ϵ η ) ¯ grad ( ( Ψ + ϵ η ) ( Ψ + ϵ η ) ¯ ) grad ln L x | α ( Ψ + ϵ η ) ( Ψ + ϵ η ) ¯ V ( d θ ) λ Θ ( Ψ + ϵ η ) ( Ψ + ϵ η ) ¯ V ( d θ ) a .
If we write Ψ ( θ ) = Ψ 1 ( θ ) + i Ψ 2 ( θ ) , where Ψ 1 and Ψ 2 are the real and imaginary parts of Ψ and may depend on x, and also considering the real and imaginary parts of η , η ( θ ) = η 1 ( θ ) + i η 2 ( θ ) and defining A = ( Ψ 1 + ϵ η 1 ) 2 + ( Ψ 2 + ϵ η 2 ) 2 , where A is obviously a function of θ and ϵ , then if we define A 0 = A | ϵ = 0 = Ψ 1 2 + Ψ 2 2 , we will have
L λ , a , α ( Ψ + ϵ η ) = Θ 1 n | 1 A grad A grad ln L x | 2 α 2 A V ( d θ ) λ Θ A V ( d θ ) a .
Notice also that
d A d ϵ = 2 ( Ψ 1 + ϵ η 1 ) η 1 + 2 ( Ψ 2 + ϵ η 2 ) η 2 .
Since grad A = 2 ( Ψ 1 + ϵ η 1 ) ( grad Ψ 1 + ϵ grad η 1 ) + 2 ( Ψ 2 + ϵ η 2 ) ( grad Ψ 2 + ϵ grad η 2 ) , and taking into account that grad ( f g ) = f grad g + g grad f , if we define C 0 = Ψ 1 η 1 + Ψ 2 η 2 , a constant with respect to ϵ but a function with respect to θ and under the assumptions of sufficient smooth functions, we have
d grad A d ϵ = 2 grad ( Ψ 1 η 1 ) + 4 ϵ η 1 grad η 1 + 2 grad ( Ψ 2 η 2 ) + 4 ϵ η 2 grad η 2 .
Therefore, taking into account C 0 = Ψ 1 η 1 + Ψ 2 η 2 , we have
d A d ϵ | ϵ = 0 = 2 Ψ 1 η 1 + 2 Ψ 2 η 2 = 2 C 0 ,
and
d grad A d ϵ | ϵ = 0 = 2 grad ( Ψ 1 η 1 ) + 2 grad ( Ψ 2 η 2 ) = 2 grad C 0 .
If we define B as
B = | 1 A grad A grad ln L x | 2 = grad ln ( A / L x ) , grad ln ( A / L x ) ,
and B 0 = B | ϵ = 0 , then we have
d B d ϵ = 2 grad ( ln A / L x ) , d grad ln A grad ln L x d ϵ = 2 grad ln ( A / L x ) , 1 A 2 d A d ϵ grad A + 1 A d grad A d ϵ ,
and
d B d ϵ | ϵ = 0 = 2 grad ln ( A 0 / L x ) , 2 C 0 A 0 2 grad A 0 + 2 A 0 grad C 0 ,
obtaining the first variation in the Lagrangian (6) as
δ L λ , a , α ( Ψ , η ) d L λ , a , α d ϵ | ϵ = 0 = Θ 1 n ( α 2 B 0 α 2 1 2 grad ln ( A 0 / L x ) , 2 C 0 A 0 grad A 0 + 2 grad C 0 + 2 B 0 α 2 C 0 2 λ n C 0 ) V ( d θ ) .
Taking into account that B 0 α 2 1 grad C 0 = grad ( B 0 α 2 1 C 0 ) C 0 grad B 0 α 2 1 , we have
δ L λ , a , α ( Ψ , η ) = 2 n Θ α grad ln ( A 0 / L x ) , grad ( B 0 α 2 1 C 0 ) ( α grad ln ( A 0 / L x ) , B 0 α 2 1 grad ln A 0 α grad ln ( A 0 / L x ) , grad ( B 0 α 2 1 ) + B 0 α 2 λ n ) C 0 V ( d θ ) .
Since C 0 = Ψ 1 η 1 + Ψ 2 η 2 = j = 1 2 Ψ j η j and grad C 0 = j = 1 2 grad ( Ψ j η j ) , we have
δ L λ , a , α ( Ψ , η ) = 2 n j = 1 2 Θ α grad ln ( A 0 / L x ) , grad ( B 0 α 2 1 Ψ j η j ) ( α grad ln ( A 0 / L x ) , B 0 α 2 1 grad ln A 0 α grad ln ( A 0 / L x ) , grad ( B 0 α 2 1 ) + B 0 α 2 λ n ) Ψ j η j V ( d θ ) .
On the other hand, observe that by the Gauss divergence theorem, we have the following
Θ d i v B 0 α 2 1 η j Ψ j grad ln ( A 0 / L x ) V ( d θ ) = Θ B 0 α 2 1 η j Ψ j grad ln ( A 0 / L x ) , ν S ( d θ ) = 0 ,
where ν is the unitary vector field on Θ pointing out Θ , and S ( d θ ) is the surface element of the boundary induced by the Riemannian metric on Θ . Taking into account that, by the boundary conditions, η vanishes at Θ or at infinity, results in (19). Then, since
grad ln ( A 0 / L x ) , grad ( B 0 α 2 1 Ψ j η j ) = B 0 α 2 1 Ψ j η j   Δ   ln ( A 0 / L x ) + div B 0 α 2 1 η j Ψ j grad ln ( A 0 / L x ) ,
where Δ is the Laplace operator notation, thus, we have,
Θ grad ln ( A 0 / L x ) , grad ( B 0 α 2 1 Ψ j η j ) V ( d θ ) = Θ B 0 α 2 1 Ψ j η j   Δ   ln ( A 0 / L x ) V ( d θ ) .
Therefore,
δ L λ , a , α ( Ψ , η ) = 2 n j = 1 2 Θ ( α B 0 α 2 1   Δ   ln ( A 0 / L x ) α grad ln ( A 0 / L x ) , B 0 α 2 1 grad ln A 0 α grad ln ( A 0 / L x ) , grad ( B 0 α 2 1 ) + B 0 α 2 λ n ) Ψ j η j V ( d θ ) .
However this first variation must be zero for arbitrary η 1 and η 2 . Therefore, we arrive at
α B 0 α 2 1   Δ   ln ( A 0 / L x ) + B 0 α 2 = α grad ln ( A 0 / L x ) , B 0 α 2 1 grad ln A 0 + grad ( B 0 α 2 1 ) + λ n ,
which may be written as the fundamental equation
α   Δ   A 0 + α A 0   Δ   ln L x + A 0 B 0 = α grad A 0 , B 0 α 2 + 1 grad ( B 0 α 2 1 ) α grad ln L x , grad A 0 + A 0 B 0 α 2 + 1 grad ( B 0 α 2 1 ) + λ n A 0 B 0 α 2 + 1 .
If we note Υ = Ψ Ψ ¯ , then we have A 0 = Υ 2 and B 0 = | grad ln ( Υ 2 ) grad ln L x | 2 , and therefore the fundamental Equation (24) becomes colback=white, colframe=red
2 α   Δ   Υ + | grad ln L x | 2 Υ = α   Δ   ln L x + λ n | grad ln Υ 2 L x | 2 α Υ + α 2 grad Υ Υ grad ln L x , | grad ln Υ 2 L x | 2 α grad | grad ln Υ 2 L x | α 2 + ( 2 α 4 ) Υ | grad Υ | 2 + ( 4 2 α ) grad Υ , grad ln L x .
Equation (25) yields the stationary points of the variational problem (5) under the constraint (3), which are not necessarily minimum points. Observe that (25) is in fact an equation which only restricts the modulus of the solution Ψ , but leaves the argument arbitrary, that is, the solutions desired by the variational problem (5) are in the form Ψ ( θ ) = Υ ( θ ) e i h ( θ ) .
If we choose a = 1 , then we obtain a direct probabilistic interpretation, that is, Υ 2 as a probability density with respect to the Riemannian volume. In this case, the equations obtained can be reinterpreted as a procedure for obtaining, from the available data, a Bayesian posterior distribution without the explicit intervention of a prior distribution.

2.4. The Model

The basic model that will be proposed will attempt to describe a situation in which we will have two approximately independent sources of information and several replicas of size k 1 > 0 and k 2 > 0 , respectively, not necessarily equal to each other, although k 1 and k 2 may be linked to each other as we will discuss later. Based on this basic model, we will assume that we have a random sample of size n > 0 for both subsamples of sizes k 1 and k 2 ; that is, we will base our calculations on a total of n ( k 1 + k 2 ) independent data.
Specifically, let X be an m-variate normal random vector with mean μ R m , a known strictly positive definite covariance matrix Σ 0 = ( σ j k ) , and j , k = 1 , , m , X N m ( μ , Σ 0 ) , and we dispose k 1 independent identically distributed copies of X , with joint absolutely continuous density equal to
f X ( x 1 , , x k 1 , μ ) = i = 1 k 1 ( 2 π ) m 2 det ( Σ 0 ) 1 2 e 1 2 ( x i μ ) T Σ 0 1 ( x i μ ) , = ( 2 π ) m k 1 2 det ( Σ 0 ) k 1 2 e k 1 2 Tr Σ 0 1 S k 1 e k 1 2 ( x ¯ k 1 μ ) T Σ 0 1 ( x ¯ k 1 μ ) .
We identify the elements of R m with m-column vectors as we need and we also define x ¯ k 1 = ( x 1 + + x k 1 ) / k 1 , S k 1 = 1 k 1 i = 1 k 1 ( x i x ¯ ) ( x i x ¯ ) T . In (26), det and Tr represent the determinant and trace matrix operators on the covariance matrix defined above.
Additionally, we will assume that we dispose of k 2 identically distributed and independent copies of a random variable T, with a joint absolutely continuous density equal to
f T ( t 1 , , t k 2 , β ) = i = 1 k 2 { α 0 e α 0 ( β t i ) 1 ( , β ] ( t i ) } = α 0 k 2 e α 0 k 2 ( β t ¯ k 2 ) 1 [ t ( k 2 ) , ) ( β ) ,
where t ( 1 ) , , t ( k 2 ) is the ordered sample; 1 [ t ( k 2 ) , ) is the characteristic function of the interval [ t ( k 2 ) , ) ; t , β R ; α 0 is a known positive constant; and t ( k 2 ) = max { t 1 , , t k 2 } . Observe that although this model is not regular, it is still possible to define the information metric, as in [16].
Combining (26) and (27) we obtain the basic model
f ( x 1 , , x k 1 , t 1 , , t k 2 , μ , β ) = ( 2 π ) m k 1 2 det ( Σ 0 ) k 1 2 e k 1 2 Tr Σ 0 1 S k 1 e k 1 2 ( x ¯ k 1 μ ) T Σ 0 1 ( x ¯ k 1 μ ) α 0 k 2 e α 0 k 2 ( β t ¯ k 2 ) 1 [ t ( k 2 ) , ) ( β ) .
The parameter space is the m + 1 dimensional manifold Θ = R m + 1 , with points θ = ( μ 1 , , μ m , β ) and basis vector field ( μ 1 , , μ m , β ) . For clarity, some results will be stated as propositions, even when elementary.
Proposition 1.
The ( m + 1 ) × ( m + 1 ) Fisher information matrix G of the model, in coordinates ( μ 1 , , μ m , β ) , is
G = G 11 G 12 G 21 G 22 ,
where G 11 is ( m × m ) , G 12 = G 21 T is ( m × 1 ) , and G 22 is a ( 1 × 1 ) block matrix. All the elements of these block matrices depend on the coordinates mentioned above, and Fisher information G and its inverse are equal to
G = k 1 Σ 0 1 0 0 T k 2 2 α 0 2 , G 1 = 1 k 1 Σ 0 0 0 T 1 k 2 2 α 0 2 ,
and its determinant and the square root of the latter, as a consequence. Then we have
g = det ( G ) = ( k 1 ) m k 2 2 α 0 2 det ( Σ 0 ) 1 , g = ( k 1 ) m 2 k 2 α 0 det ( Σ 0 ) 1 2 .
Proof. 
The proof is just a computation. First, observe that, in matrix notation,
ln f X ( x 1 , x k 1 , μ ) μ = k 1 Σ 0 1 ( x ¯ k 1 μ ) ,
and therefore
G 11 = E μ k 1 2 Σ 0 1 ( X ¯ k 1 μ ) ( X ¯ k 1 μ ) T Σ 0 1 = k 1 2 Σ 0 1 E μ ( X ¯ k 1 μ ) ( X ¯ k 1 μ ) T Σ 0 1 = k 1 2 Σ 0 1 1 k 1 Σ 0 Σ 0 1 = k 1 Σ 0 1 ,
where X ¯ k 1 = 1 k 1 ( X 1 + + X k 1 ) , and it follows that X ¯ k 1 N m ( μ , 1 k 1 Σ 0 ) .
Since X i and T j are independent and E μ ln f ( X 1 , X k 1 , μ ) μ = 0 , G 12 = G 21 T = 0 and since, with probability one, we have
ln f T ( t 1 , , t k 2 β ) β = α 0 k 2 ,
thus, G 22 = k 2 2 α 0 2 , obtaining (29). The inverse G 1 , the determinant g = det ( G ) , and its square root follow directly, completing the proof. □
Assume now that we dispose of a particular sample of n > 0 independent copies of X 1 , , X k 1 and T 1 , , T k 2 , that is, ( x i 1 , , x i k 1 ) and ( t i 1 , , t i k 2 ) for i = 1 , , n , where n, k 1 , and k 2 are positive integers. After some tedious but straightforward computation, the joint likelihood of the model will be
L ( x , t ) ( μ , β ) = ( 2 π ) m n k 1 2 det ( Σ 0 ) n k 1 2 e n k 1 2 Tr Σ 0 1 S n k 1 e n k 1 2 ( x ¯ n k 1 μ ) T Σ 0 1 ( x ¯ n k 1 μ ) α 0 n k 2 e α 0 n k 2 ( β t ¯ n k 2 ) 1 [ t ( n k 2 ) , ) ( β ) .
We regard elements of R m as column vectors and set x ¯ n k 1 = ( x 11 + + x n k 1 ) / n k 1 , S n k 1 = 1 n k 1 i = 1 n j = 1 k 1 ( x i j x ¯ n k 1 ) ( x i j x ¯ n k 1 ) T , t ( n k 2 ) = max ( t 11 , , t n k 2 ) , and t ¯ k = ( t 1 + + t k ) / k . We take the reference point ( μ 0 , β 0 ) with the arbitrary and β 0 = t ( n k 2 ) . The Mahalanobis distance is then d M 2 ( μ , x ¯ n k 1 ) = ( μ x ¯ n k 1 ) T Σ 0 1 ( μ x ¯ n k 1 ) (see [17]).
L ( x , t ) ( μ , β ) L ( x , t ) ( μ 0 , β 0 ) = e n k 1 2 d M 2 ( μ , x ¯ n k 1 ) d M 2 ( μ 0 , x ¯ n k 1 ) α 0 n k 2 ( β t ¯ n k 2 ) , β t ( n k 2 ) .
Let us consider fixed sample values x and t and additionally suppose that we have strong reasons to believe that the density Ψ Ψ ¯ is absolutely continuous with respect to the Riemann volume and is concentrated at Θ ( x , t ) = R m × ( t ( n k 2 ) , t m a x ) , where t m a x is an arbitrarily large real number greater than t ( n k 2 ) . Therefore, the information, I ( x , t ) ( μ , β ) , encoded by the data ( x , t ) , relative to the true parameter ( μ , β ) , referred to as a reference point ( μ 0 , β 0 ) , with β 0 = t ( n k 2 ) , and defined as minus the logarithm of (36) is
I ( x , t ) ( μ , β ) = α 0 n k 2 ( β t ¯ n k 2 ) + n k 1 2 d M 2 ( μ 0 , x ¯ n k 1 ) d M 2 ( μ , x ¯ n k 1 ) , t ( n k 2 ) β t m a x .
Then, we have the next proposition.
Proposition 2.
The partial derivatives of I ( x , t ) defined in (37), using the summation convention of repeated indices and classical notation, where a comma preceding an index indicates the covariant derivative in the direction of the coordinate indicated by the index, are given by
( I ( x , t ) ) , α = I ( x , t ) μ α = n k 1 σ α γ ( μ γ x ¯ n k 1 γ ) , α = 1 , , m ,
( I ( x , t ) ) , ( m + 1 ) = I ( x , t ) β = n k 2 α 0 , t ( n k 2 ) β t m a x ,
where we have defined the derivatives at β = t ( n k 2 ) as the right derivatives. The gradient of I ( x , t ) using matrix notation is
g r a d I ( x , t ) = n ( μ x ¯ n k 1 ) n k 2 α 0 , t ( n k 2 ) β t m a x ,
and the square of its norm will be given by the scalar field
| g r a d I ( x , t ) | 2 = n 2 k 1 d M 2 ( μ , x ¯ n k 1 ) + 1 , t ( n k 2 ) β t m a x .
Proof. 
Accounting for the symmetry of the covariance matrix and its inverse, with elements σ α γ (superscripts without tensorial meaning), we obtain (38) and (39). For the gradient computation, from (30) and using matrix notation, we have
grad I ( x , t ) = 1 k 1 Σ 0 0 0 T 1 k 2 2 α 0 2 n k 1 Σ 0 1 ( μ x ¯ n k 1 ) n k 2 α 0 = n ( μ x ¯ n k 1 ) n k 2 α 0 , t ( n k 2 ) β t m a x ,
obtaining (40). The square of its norm is
| grad I ( x , t ) | 2 = n ( μ x ¯ n k 1 ) T n k 2 α 0 k 1 Σ 0 1 0 0 T k 2 2 α 0 2 n ( μ x ¯ n k 1 ) n k 2 α 0 = n 2 k 1 d M 2 ( μ , x ¯ n k 1 ) + 1 , t ( n k 2 ) β t m a x .
obtaining (41). □
Proposition 3.
For an arbitrary function h, the Laplacian, using the repeated-index summation convention, if we denote θ i = μ i , i = 1 , , m , and θ m + 1 = β , is
Δ h = 1 k 1 σ i j 2 h μ i μ j + 1 k 2 2 α 0 2 2 h β 2 .
In particular,
Δ I ( x , t ) = n m , t ( n k 2 ) β t m a x .
where we have defined that the second derivatives at β = t ( n k 2 ) are its right second derivatives.
Proof. 
With the above-mentioned notation, the Laplacian of h is
Δ h = 1 g θ i g g i j h θ j = 1 k 1 σ i j 2 h μ i μ j + 1 k 2 2 α 0 2 2 h β 2 ,
obtaining (44). Moreover, taking into account (38) and (39), we have
2 I ( x , t ) μ i μ j = n k 1 σ i j , 2 I ( x , t ) β 2 = 0 , β t ( n k 2 ) .
Since by repeated-index summation convention, σ i j σ i j = m , we shall have
Δ I ( x , t ) = 1 k 1 σ i j n k 1 σ i j = n m , β t ( n k 2 ) .
obtaining (45). □

2.5. The Fundamental Equation for the Model

Combining Propositions (1), (2), and (3), the fundamental Equation (24) for model (28) with joint sample likelihood (35) can be written under the boundary conditions (4).
Proposition 4.
Under the repeated-index summation convention, the fundamental Equation (24) for model (28) takes the form
4 σ i j k 1 2 Ψ μ i μ j + 1 k 2 2 α 0 2 2 Ψ β 2 + n 2 k 1 d M 2 ( μ , x ¯ n k 1 ) + 1 Ψ = n 2 m + λ Ψ , t ( n k 2 ) β t m a x .
where t m a x is arbitrarily large. Equation (49) has be solved with the boundary conditions given by lim β t ( n k 2 ) + Ψ ( μ , β ) = c 1 , lim β t m a x Ψ ( μ , β ) = c 2 , where c 1 and c 2 do not depend on β, lim μ α Ψ ( μ , β ) = lim μ α Ψ ( μ , β ) = 0 , α = 1 , , m and Θ ( x , t ) Ψ 2 ( θ ) V ( d θ ) = 1 , and where Θ ( x , t ) = { θ R m + 1 | t ( n k 2 ) β t m a x } .
Proof. 
Equation (49) is an immediate consequence of (24), (41), and (45). □

2.6. A Fundamental Equation Solution for the Model

A nontrivial, variable-separated solution of (49) is obtained.
Proposition 5.
Let Φ ( μ ) and Υ ( β ) be functions of μ and β, respectively, such that
4 σ i j k 1 2 Φ μ i μ j + n 2 k 1 d M 2 ( μ , x ¯ n k 1 ) 2 n m Φ = ξ Φ ,
and
4 1 k 2 2 α 0 2 2 Υ β 2 + n λ n 2 Υ = ξ Υ .
Let ξ be constant. In (50) we use repeated-index summation. Equations (50) and (51) are solved under the boundary conditions μ α ± ( α = 1 , , m ), β , lim β t ( n k 2 ) + Ψ ( μ , β ) = c 1 , lim β t max Ψ ( μ , β ) = c 2 , with c 1 , c 2 constants independent of β, and normalization Θ ( x , t ) Ψ 2 ( θ ) V ( d θ ) = 1 . Then Ψ ( μ , β ) = Φ ( μ ) Υ ( β ) is a variable-separated solution of (49) in Proposition (4).
Proof. 
Observe that if Ψ ( μ , β ) = Φ ( μ ) Υ ( β ) , then (49) is written as
4 σ i j k 1 2 Φ μ i μ j + n 2 k 1 d M 2 ( μ , x ¯ n k 1 ) 2 n m Φ 1 Φ = 4 1 k 2 2 α 0 2 2 Υ β 2 + n λ n 2 Υ 1 Υ .
The left-hand side of (52) depends only on μ , whereas the right-hand side depends only on β . Hence both sides must equal to a constant, say ξ , yielding (50) and (51). Moreover, the boundary conditions on Φ and Υ entail those on Ψ . □

2.7. Solving (50) and (51) for m = 1

For m = 1 , with σ 2 = σ 11 , μ = μ 1 , and Φ = 2 Φ / μ 2 , division by 2 n 2 yields from (50)
2 σ 2 n 2 k 1 Φ + k 1 2 σ 2 ( μ x ¯ n k 1 ) 2 Φ = ( ξ 2 n 2 + 1 n ) Φ ,
which must be solved with the boundary conditions lim μ Φ ( μ ) = lim μ Φ ( μ ) = 0 . This is essentially Equation (30) or (36) of [8]. No nontrivial solutions exist unless
ξ 2 n 2 + 1 n = 2 n ν + 1 2 = E ν , ν = 0 , 1 , 2 , ,
This fixes ξ = 4 n ν , ν = 0 , 1 , 2 , , with E ν being the energy. For the ground state ( ν = 0 ), E 0 = 1 / n . Nontrivial solutions for each ν are given by Hermite polynomials [8]; wave functions for ν = 0 , 1 are
ϕ 0 ( μ ) = n k 1 2 π σ 2 1 4 e n k 1 4 σ 2 ( μ x ¯ n k 1 ) 2 ,
and
ϕ 1 ( μ ) = n k 1 3 2 π σ 2 1 4 k 1 x ¯ n k 1 μ σ e n k 1 4 σ 2 ( μ x ¯ n k 1 ) 2 ,
respectively.
On the other hand, Equation (51) results in
4 1 k 2 2 α 0 2 2 Υ β 2 + n λ ξ n 2 Υ = 0 , t ( n k 2 ) β t m a x .
Taking into account that ξ = 4 n ν for ν N , we obtain
2 Υ β 2 = k 2 2 α 0 2 n 1 4 ( λ n ) ν Υ , t ( n k 2 ) β t m a x ,
and also, making explicit the energy E ν , instead of ν , we shall have
2 Υ β 2 = k 2 2 α 0 2 n 2 1 2 ( λ n ) + 1 n E ν Υ , t ( n k 2 ) β t m a x .
Basically, we can re-express Equation (59) as Υ = cte Υ . Provided that cte 0 , we obtain the general solution
Υ ( β ) = C 1 e cte β + C 2 e cte β ,
and if cte = 0 , then
Υ ( β ) = C 1 β + C 2 .
To satisfy the boundary conditions, we must carefully choose the integration constants and the value of λ . Several cases can be considered; however, at this point, we only indicate that convenient values of λ can yield compatible solutions with the wave equation of a quantum harmonic oscillator.
For instance, consider the case when we are in the ground state of the quantum harmonic oscillator, which nullifies the term 1 n E ν . From Equation (59), we obtain the following
4 k 2 2 α 0 2 2 Υ β 2 + n 2 Υ = λ n Υ , t ( n k 2 ) β t m a x .
This equation again represents an eigenvalue problem of the form k   Δ   Υ + c V Υ = E Υ , where E is the eigenvalue. If we set λ = 1 n 2 , we find that the energy corresponds to the intrinsic Cramér–Rao bound m n = 1 n , where m is the number of parameters involved. In the exponential case, we have m = 1 .

3. Discussion

This paper makes a significant contribution to the theoretical foundations of variational informational principles by presenting a robust framework that adeptly integrates complex variables and norms of arbitrary order. To demonstrate the practical implications of our framework, we offer a compelling model example that illustrates how these innovative principles can be effectively applied. We aim to inspire further exploration of the intricate relationship between complex analysis, information theory, and physics. This relationship holds great potential for addressing some of the most profound challenges in contemporary science [18,19].
Our work establishes a robust foundation for a transformative class of variational methods that has the potential to make a significant impact across multiple disciplines, including data analysis and applied research. These innovative methods provide researchers with advanced tools for analyzing and optimizing complex informational structures within high-dimensional environments, effectively addressing the challenges posed by intricate data landscapes. By facilitating the efficient manipulation of sophisticated variables and accommodating arbitrary-order norms, our approach paves the way for the development of more resilient algorithms applicable to various fields, including data analysis, machine learning, and signal processing. These advancements promise not only enhanced computational performance but also more profound insights into the fundamental dynamics of complex systems [20,21,22].
This foundational research aims to inspire further exploration and innovation within the scientific community. By harnessing the versatility of these variational methods, we can create advanced models that effectively capture the intricate complexities of both natural phenomena and engineered systems. We are confident that ongoing investigations will lead to the development of powerful analytical tools, significantly enhancing our understanding of high-dimensional informational structures and improving the performance of algorithms across various scientific and technological fields [23,24].
Ultimately, our work aims to catalyze pioneering research initiatives that will extend the frontiers of existing methodologies. By incorporating advanced technologies and interdisciplinary approaches, we seek to unveil new opportunities that will lead to a more profound understanding of the fundamental laws of nature. This endeavor not only opens doors to innovative discoveries but also fosters collaboration among researchers dedicated to exploring the universe’s complexities and the principles that govern it.

Author Contributions

Conceptualization, D.B.-C. and J.M.O.; writing—original draft preparation, D.B.-C. and J.M.O.; writing—review and editing, D.B.-C. and J.M.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
  2. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley Series in Telecommunications and Signal Processing; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  3. Boucheron, S.; Lugosi, G.; Massart, P. Concentration Inequalities: A Nonasymptotic Theory of Independence; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
  4. Vidal, G.; Werner, R.F. Computable measure of entanglement. Phys. Rev. A 2002, 65, 032314. [Google Scholar] [CrossRef]
  5. Liese, F.; Vajda, R. Convex Statistical Distances. In Asymptotic Methods in Statistical Decision Theory and Inference; Elsevier: Amsterdam, The Netherlands, 2006; pp. 65–128. [Google Scholar]
  6. Bernal-Casas, D.; Oller, J.M. Variational Information Principles to Unveil Physical Laws. Mathematics 2024, 12, 3941. [Google Scholar] [CrossRef]
  7. Bernal-Casas, D.; Oller, J.M. Information-Theoretic Models for Physical Observables. Entropy 2023, 25, 1448. [Google Scholar] [CrossRef] [PubMed]
  8. Bernal-Casas, D.; Oller, J.M. Intrinsic Information-Theoretic Models. Entropy 2024, 26, 370. [Google Scholar] [CrossRef] [PubMed]
  9. Bernal-Casas, D.; Oller, J.M. Analyzing Sample Size in Information-Theoretic Models. Mathematics 2024, 12, 4018. [Google Scholar] [CrossRef]
  10. Rao, C. Information and Accuracy Attainable in Estimation of Statistical Parameters. Bull. Calcutta Math. Soc. 1945, 37, 81–91. [Google Scholar]
  11. Burbea, J.; Rao, C.R. Entropy differential metric, distance and divergence measures in probability spaces: A unified approach. J. Multivar. Anal. 1982, 12, 575–596. [Google Scholar] [CrossRef]
  12. Amari, S.i. Information Geometry and Its Applications, 1st ed.; Springer Publishing Company, Incorporated: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  13. Oller, J.M.; Corcuera, J.M. Intrinsic Analysis of Statistical Estimation. Ann. Stat. 1995, 23, 1562–1581. [Google Scholar] [CrossRef]
  14. Nielsen, F. An Elementary Introduction to Information Geometry. Entropy 2020, 22, 1100. [Google Scholar] [CrossRef] [PubMed]
  15. Rudin, W. Real and Complex Analysis, 3rd ed.; McGraw-Hill: New York, NY, USA, 1987. [Google Scholar]
  16. Yoshioka, M.; Tanaka, F. Information-Geometric Approach for a One-Sided Truncated Exponential Family. Entropy 2023, 25, 769. [Google Scholar] [CrossRef] [PubMed]
  17. Mahalanobis, P. On the generalized distance in Statistics. Proc. Nat. Inst. Sc. India 1936, 2, 49–55. [Google Scholar]
  18. Amari, S.i.; Nagaoka, H. Methods of Information Geometry; Translations of Mathematical Monographs, American Mathematical Society: Providence, RI, USA, 2000. [Google Scholar]
  19. Scully, M.O.; Zubairy, M.S. Quantum Optics; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar] [CrossRef]
  20. Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
  21. Katsoulakis, M.A.; Vilanova, P. Data-driven, variational model reduction of high-dimensional reaction networks. J. Comput. Phys. 2020, 401, 108997. [Google Scholar] [CrossRef]
  22. Lee, S.; Holzinger, A. Knowledge Discovery from Complex High Dimensional Data. In Solving Large Scale Learning Tasks. Challenges and Algorithms; Michaelis, S., Piatkowski, N., Stolpe, M., Eds.; Lecture Notes in Computer Science Book Series; Springer International: Berlin/Heidelberg, Germany, 2016; pp. 148–167. [Google Scholar] [CrossRef]
  23. Johnstone, I.M.; Titterington, D.M. Statistical challenges of high-dimensional data. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2009, 367, 4237–4253. [Google Scholar] [CrossRef] [PubMed]
  24. Maturo, F.; Rambaud, S.C.; Ventre, V. Advances in statistical learning from high-dimensional data. Qual. Quant. Int. J. Methodol. 2025, 59, 1933–1937. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bernal-Casas, D.; Oller, J.M. Enhancing Variational Informational Principles: A Complexified Approach with Arbitrary Order Norms. Mathematics 2025, 13, 3160. https://doi.org/10.3390/math13193160

AMA Style

Bernal-Casas D, Oller JM. Enhancing Variational Informational Principles: A Complexified Approach with Arbitrary Order Norms. Mathematics. 2025; 13(19):3160. https://doi.org/10.3390/math13193160

Chicago/Turabian Style

Bernal-Casas, D., and José M. Oller. 2025. "Enhancing Variational Informational Principles: A Complexified Approach with Arbitrary Order Norms" Mathematics 13, no. 19: 3160. https://doi.org/10.3390/math13193160

APA Style

Bernal-Casas, D., & Oller, J. M. (2025). Enhancing Variational Informational Principles: A Complexified Approach with Arbitrary Order Norms. Mathematics, 13(19), 3160. https://doi.org/10.3390/math13193160

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop