We construct geometric shrinkage priors for Kählerian signal filters. Based on the characteristics of Kähler manifolds, an efficient and robust algorithm for finding superharmonic priors which outperform the Jeffreys prior is introduced. Several ansätze for the Bayesian predictive priors are also suggested. In particular, the ansätze related to Kähler potential are geometrically intrinsic priors to the information manifold of which the geometry is derived from the potential. The implication of the algorithm to time series models is also provided.
Kähler manifold; information geometry; Bayesian prediction; superharmonic prior
In information geometry, signal processing is one of the most important applications. In particular, an information geometric approach to various linear time series models has been also well-known [1–7]. The geometric description of the linear systems is not confined to the pursuit of mathematical beauty. Komaki’s work  is in the line of developing practical tools for Bayesian inference. Using the Kullback–Leibler divergence as a risk function for estimation, he found that superharmonic shrinkage priors outperform the Jeffreys prior in the viewpoint of information theory. Better prediction in the Bayesian framework is attainable by the Komaki priors.
However, a difficult part of Komaki’s idea in practice is verifying whether or not a prior function is superharmonic. In particular, when high-dimensional statistical manifolds are considered, it is technically tricky to test the superharmonicity of prior functions because Laplace–Beltrami operators on the manifolds are non-trivial. Although some superharmonic priors for the autoregressive (AR) models were found not only in the two-dimensional cases [5,7] but also in arbitrary dimensions , there is no clue about the Bayesian shrinkage priors of more complicated models such as the autoregressive moving average (ARMA) models, the fractionally integrated ARMA (ARFIMA) models, and any arbitrary signal filters. Additionally, generic algorithms for systematically obtaining the information shrinkage priors are not known yet.
The connection between Kähler manifolds and information geometry has been reported [4,9–12] and the mathematical correspondence between a Kähler manifold and the information geometry of a linear system is recently revealed. It is found that the information geometry of a signal filter with a finite complex cepstrum norm is a Kähler manifold . In particular, the Hermitian condition on the Kählerian information manifolds is clearly seen under conditions on the transfer function of the linear system. Moreover, many practical aspects of introducing Kähler manifolds to information geometry for signal processing were also reported in the same literature . One of the benefits in the Kählerian information geometry is that the simpler form of the Laplace–Beltrami operator on the Kähler manifold is beneficial to finding the Komaki priors.
In this paper, we construct Komaki-style shrinkage priors for Kählerian signal filters. By introducing an algorithm which is based on the characteristics of Kähler manifolds, the Bayesian predictive priors outperforming the Jeffreys prior can be obtained in a more efficient and more robust way. Several prior ansätze are also suggested. Among the ansätze, the geometric shrinkage priors related to Kähler potential are intrinsic priors on the information manifold because the geometry is given by the Kähler potential. We also provide the geometric priors for the ARFIMA models where the Komaki priors have not been reported. The structure of this paper is as follows. In next section, theoretical backgrounds of Kählerian information geometry and superharmonic priors are introduced. In Section 3, an algorithm and ansätze for the geometric shrinkage priors are suggested. The implication of the algorithm to the ARFIMA models is given in Section 4. We conclude the paper in the last section.
2. Theoretical Backgrounds
2.1. Kählerian Filters
A linear filter with n-dimensional complex parameters ξ is characterized by a transfer function h(w; ξ) in the frequency domain w with
where y and x are complex output and input signals, respectively. A spectral density function S(w; ξ) is defined as the absolute square of the transfer function
and it is a real-valued measurable quantity.
In information geometry, it is well-known by Amari and Nagaoka  that the geometry of a linear system is determined by the spectral density function S(w; ξ) under the stability condition, minimum phase, and
The last condition is also known as the finite unweighted norm of the power cepstrum of a filter [13,14]. For a linear system with the spectral density function satisfying the above conditions, the metric tensor of the information geometry is given by
where the partial derivatives are taken with respect to the model parameters ξ.
The metric tensor can be expressed in a complexified coordinate system and the Z-transformed transfer function. With the Z-transformation, the holomorphic transfer function can be written in the form of series expansion of z
where hr is an impulse response function. The Z-transformed power spectrum is also defined in the similar way. In this case, the conditions on the transfer function for constructing information geometry are identical to the spectral density function representation except for
and it is a necessary condition for the finite power cepstrum norm. The condition indicates that the Hardy norm of the logarithmic transfer function, also known as the unweighted complex cepstrum norm [14,15], is finite. The metric tensor of the geometry is given by the transfer function,
where i, j run from 1 to n and
are the complex conjugates of gij and
After plugging the Z-transformed transfer function, Equation (1), into the metric tensor expressions, Equations (2) and (3), the metric tensor is expressed with the series expansion coefficients in z of the logarithmic transfer function by
where ηr is the coefficient of z−r in the series expansion of the logarithmic transfer function, also known as a complex cepstrum coefficient . It is obvious that η0 = log h0.
Recently, it is found by Choi and Mullhaupt  that the information geometry of a linear system with a finite Hardy norm of a logarithmic transfer function (or the complex cepstrum norm) is the Kähler manifold that is the Hermitian manifold with the closed Kähler two-form:
for the Hermitian manifold and
for the closed Kähler two-form. Additionally, the Hermitian structure can be explicitly seen in the metric tensor if and only if the impulse response function with the highest degree in z, i.e., h0 in the unilateral transfer function case, is a constant in model parameters ξ. In this paper, for simplicity, we only consider unilateral transfer functions with non-zero h0 and the Kähler manifolds with the explicit Hermitian conditions on the metric tensors because complex manifolds are always Hermitian manifolds . In this case, the necessary and sufficient condition for being a Kähler manifold is that h0(ξ) is a constant in ξ .
According to Choi and Mullhaupt , the benefits of the Kählerian description are the followings. First of all, geometric objects are straightforwardly computed on a Kähler manifold. The non-trivial metric tensor component is simply derived from the following formula
is the Kähler potential of the geometry. The Kähler potential in the information geometry of a linear filter is the square of the Hardy norm (or H2-norm) of the logarithmic transfer function (or the square of the complex cepstrum norm) on the unit disk ⅅ
and the details of the derivation are given in the literature . The non-trivial components of the Levi–Civita connection are expressed as
and the other connection components are all vanishing. Notice that it is much simpler than the connection components on a non-Kähler manifold given by
and it is obvious that the number of calculation steps is significantly reduced in the Kähler case. The Riemann curvature tensor of the linear system geometry is also represented in the simpler form which is given in Choi and Mullhaupt . The Ricci tensor on the Kähler manifold is obtained as
is the determinant of the metric tensor. It is evident that we can skip the calculation of the Riemann curvature tensor in order to compute the Ricci tensor on a Kähler manifold.
Additionally, the α-generalization of the geometric objects is linear in α on Kähler manifolds. Since the Riemann curvature tensor on a Kähler manifold is linear in the α-connection which is α-linear, the Riemann tensor also exhibits the α-linearity which leads to the α-linear Ricci tensor and scalar curvature.
In addition to these advantages, any submanifolds of a Kähler manifold are also Kähler manifolds. If the information geometry of a given statistical model is a Kähler manifold, its submodels also have Kähler manifolds as the information geometry and all the properties of the ambient manifold are also equipped with the submanifolds.
Lastly, the Kählerian information geometry is also useful to find superharmonic priors because of the simpler Laplace–Beltrami operators on the manifolds. We will cover the details of the superharmonic priors soon.
2.2. Superharmonic Priors
For further discussions, we need to introduce the superharmonic priors suggested by Komaki . When we want to find the true probability distribution p(y|ξ) based on given samples x of size N, one of the best approaches is using Bayesian predictive density pπ(y|x(N)) with a prior π(ξ):
The superharmonic priors πI are derived from the difference between two risk functions with respect to the true probability density, one from the Jeffreys prior and another from the superharmonic prior:
where DKL is the Kullback–Leibler divergence and πJ is the Jeffreys prior which is the volume form of the statistical manifold. Each risk function indicates how far a given Bayesian predictive density is from the true distribution in the Kullback–Leibler divergence in average. Sine better priors are obtained from smaller risk functions, the priors outperforming the Jeffreys prior make the above expression greater than zero. Since the first term on the right-hand side is non-negative, the risk function of the Komaki prior is decreased with respect to the risk function of the Jeffreys prior if a prior function ψ = πI/πJ is superharmonic. If a superharmonic prior function ψ can be found, it is possible to do better Bayesian prediction in the viewpoint of information theory. In the same paper, Komaki also pointed out that shrinkage priors are information-theoretically more improved in prediction than the Jeffreys prior if and only if the square root of a prior function is superharmonic.
Since Komaki’s paper , several superharmonic priors for the AR models have been found [5–7]. The Komaki prior for the AR(2) model in the pole coordinates  is given by
where ξi is a pole of the transfer function. Tanaka  generalized the two-dimensional case to superharmonic priors for the AR model in an arbitrary dimension p. The shrinkage prior function for the AR(p) model is in the form of
where ξi is a pole of the AR transfer function.
As mentioned before, one of the advantages in the Kählerian description is that finding the Komaki prior functions becomes more efficient than those in non-Kähler description because the Laplace–Beltrami operators on Kähler manifolds are in the simpler forms. For a differentiable function ψ, the Laplace–Beltrami operator in the Kähler geometry is represented with
Meanwhile, the Laplace–Beltrami operator on a non-Kähler manifold is expressed as
is the determinant of the metric tensor. It is obvious that additional calculations for the latter two terms in the right-hand side are indispensable in the non-Kähler cases.
With the computational benefits on the Kählerian information manifolds, the superharmonic prior function for the Kähler-AR(2) model  is found
is the i-th pole of the transfer function and
is the complex conjugate of ξi. However, its generalization to any arbitrary dimensions has been unknown. Moreover, the Komaki priors for the ARMA models and the ARFIMA models are not reported yet.
3. Geometric Shrinkage Priors
As shown in the previous section, Kähler manifolds in information geometry are useful in order to obtain the superharmonic priors. In this section, we introduce an algorithm to find the geometric shrinkage priors by using the properties of Kähler geometry. Moreover, several ansätze for the priors are suggested.
For further discussions, let us set
where u* is a constant in
and its complex conjugate
. The following lemma is worthwhile when the algorithm for the prior functions is constructed.
Lemma 1.On a Kähler manifold, a functionis superharmonic ifis in the form ofsuch that κ is subharmonic (or harmonic) and Ψ′(τ) > 0, Ψ″(τ) ≤ 0 (or Ψ′(τ) > 0, Ψ″(τ) < 0).
Proof. The Laplace–Beltrami operator on ψ is given by
where the derivatives on Ψ are taken with respect to τ. It is obvious that if κ is subharmonic (or harmonic) and if Ψ′(τ) > 0, Ψ″(τ) ≤ 0 (or Ψ′(τ) > 0, Ψ″(τ) < 0), then the right-hand side is negative, i.e., ψ is a superharmonic function.□
According to Lemma 1, superharmonic functions are easily obtained from subharmonic or harmonic functions by simply plugging the (sub-)harmonic functions as κ into Lemma 1.
By considering that a prior function should be positive, it is able to utilize Lemma 1 for obtaining the superharmonic prior functions. Let us confine the function ψ in Lemma 1 to be positive.
Theorem 1.On a Kähler manifold, a positive function ψ = Ψ(u*−κ) is a superharmonic prior function if κ is subharmonic (or harmonic) and Ψ′(τ) > 0, Ψ″(τ) ≤ 0 (or Ψ′(τ) > 0, Ψ″(τ) < 0).
Proof. Since this is a special case of Lemma 1, the proof is obvious. □
Although any (sub-)harmonic function κ can be used for constructing superharmonic priors, restriction on κ makes finding the ansätze of the geometric priors easier. From now on, upper-bounded functions are only our concerns. Additionally, we assume that κ and u* are real. With these assumptions, it is possible to set u* as a constant greater than the upper bound of κ in order for τ to be positive.
Ansätze for Ψ can be found in the following example.
Example 1.Given subharmonic (or harmonic) κ and positive τ, i.e., upper-bounded κ, the following functions are candidates for Ψ
where 0 < a ≤ 1 (or 0 < a < 1).
Proof. We only cover a subharmonic case for κ here and it is also straightforward for the harmonic case. First of all, Ψ1 and Ψ2 are all positive. For Ψ1, it is easy to verify the followings:
for 0 < a ≤ 1. The similar calculation is repeated for Ψ2:
for 0 < a ≤ 1.
Both functions Ψ1 and Ψ2 satisfy the conditions for Ψ in Lemma 1.□
It is also possible to find ansätze for upper-bounded subharmonic κ. The following functions are candidates for upper-bounded and subharmonic κ.
Example 2.For positive real numbers ar and bi, the following subharmonic functions are candidates for κ in the cases that those are upper-bounded:
Proof. Let us assume that the ansätze are upper-bounded in given domains. For κ1, it is easy to show that the Kähler potential K is subharmonic:
The proof for subharmonicity of κ2 is as follows:
The subharmonicity of κ3 is tested by
If the upper-boundedness is satisfied, the above subharmonic functions are ansätze for κ. □
Superharmonic prior functions on the Kähler manifolds are efficiently constructed from the following algorithm which exploits Theorem 1 and the ansätze for Ψ and κ. When we find positive and superharmonic functions, it is automatically the Komaki-style prior functions as usual. If positive, upper-bounded, and (sub-)harmonic functions are found, those functions are plugged into Theorem 1 in order to obtain superharmonic prior functions. Multiplying the Jeffreys prior by the superharmonic prior functions, we finally acquire the geometric shrinkage priors. Additionally, since the ansätze are already given, there is no extra cost to find the Komaki prior functions except for verifying whether or not the information geometry is a Kähler manifold. Comparing with the literature on the Komaki priors of the time series models [5–7], obtaining the geometric priors on the Kähler manifolds becomes more efficient and more robust.
4. Example: ARFIMA Models
The ARFIMA model is the generalization of the ARMA model with a fractional differencing parameter in order to model the long memory process. The transfer function of the ARFIMA(p, d, q) model with parameters
is given by
where d is the differencing parameter and μi, λi, σ are a pole, a root, and a gain in the ARMA model, respectively. It is noteworthy that the transfer function of the ARFIMA model is decomposed into the ARMA model part and the fractionally integration part. Additionally, every poles and roots of the linear system are located inside the unit disk, i.e., |λi| < 1 for i = 1, ⋯, p and |μi| < 1 for i = 1, ⋯, q.
Similar to the ARMA case , the full geometry of the ARFIMA model is a Kähler manifold and the submanifold of a constant gain σ is also Kähler geometry. This submanifold also exhibits the explicit Hermitian condition on the metric tensor. It is easy to cross-check the Hermitian structure by fixing h0 = 1 up to the gain of the signal filter. We will work on this submanifold.
Since the information geometry of the ARFIMA model is a Kähler manifold, the Kähler potential of the ARFIMA geometry is obtained from the square of the Hardy norm of the logarithmic transfer function (or the square of the complex cepstrum norm), Equation (5), represented with
It is obvious that the Kähler potential for the ARFIMA model, Equation (8), is reducible to the Kähler potential of the ARMA geometry by setting d = 0. It is easy to verify that the Kähler potential of the ARFIMA geometry is upper-bounded by
By using Equation (4), the metric tensor of the Kähler geometry is simply derived from the Kähler potential. The metric tensor of the Kähler-ARFIMA geometry is given by
and it is easy to show that the metric tensor contains the pure ARMA metric. The metric tensor is also in the similar form to the ARFIMA geometry in non-complexified coordinates . The metric tensor indicates that the ARMA geometry is embedded in the ARFIMA geometry and corresponds to the submanifold of the ARFIMA manifold. The ARMA part of the metric tensor is the same metric with the Kähler-ARMA geometry in Choi and Mullhaupt . In addition to that, we can cross-check the fact that the ARMA geometry is also a Kähler manifold based on a property of a Kähler manifold that a submanifold of the Kähler geometry is Kähler.
Other geometric objects can be derived from the metric tensor. For example, the non-trivial components of the 0-connection are given by Equation (6). It is noteworthy that any connection components with the d-coordinate in the first two indices of the connection are trivially zero and the others might not be vanishing. Similar to the 0-connection, the Ricci tensor components along the fractionally integrated direction are also zero because there is no dependence on d in the metric tensor. Considering the Schur complement, the non-vanishing Ricci tensor components are decomposed into the Ricci tensor from the pure ARMA part and the term from the mixing between the ARMA part and the fractionally integrated (FI) part:
where i and j are not along the d-coordinate.
It is the time to be back to the geometric shrinkage priors. Since the Kähler potential of a given ARFIMA model is upper-bounded by a constant
, the intrinsic priors on the Kähler manifold can be found as it is proven in the previous section. By using the algorithm and the ansätze related to the Kähler potential, some geometric shrinkage prior functions for the ARFIMA model are constructed as
where 0 < a ≤ 1. It is also noteworthy that when d = 0 in the Kähler potential, superharmonic priors of the ARMA (or AR/MA) models are obtained and finding the priors becomes much simpler than the literature on the Komaki priors of the AR models [5–7]. Similarly, κ2 and κ3 are also utilized for the superharmonic prior function ansätze in the ARFIMA models because the both functions are upper-bounded on the ARFIMA manifold. Moreover, if we set d = 0 for κ2 or b0 = 0 for κ3, the ansätze for the ARFIMA models are reducible to the Komaki priors of the ARMA models.
In this paper, we build up an algorithm and ansätze for the geometric shrinkage priors of Kählerian signal filters. By using the properties of Kähler manifolds, an algorithm to find the Komaki priors is constructed and ansätze for the prior functions are suggested. Additionally, some ansätze associated with the Kähler potential are geometrically intrinsic to Kählerian information manifolds because the geometry is derived from the Kähler potential which is the square of the complex cepstrum norm of a linear system.
Comparing with the literature on the Komaki priors of the time series models, verification of the geometric priors is much easier on the Kähler manifold and it is also possible to acquire the geometric shrinkage priors for highly complicated models in the more efficient and robust way. For example, Bayesian predictive priors for the ARFIMA model are obtained from the algorithm and ansätze for the prior functions. The shrinkage priors of the ARMA cases are simply found from the geometric shrinkage priors of the ARFIMA models by using the property of submanifolds in the Kähler geometry.
We are thankful to Michael Tiano for useful discussions.
Both authors contributed equally to the main idea. The research was conducted out by both authors. Jaehyung Choi wrote the paper. Both authors have read and approved the final manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Amari, S.; Nagaoka, H. Methods of information geometry; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
Ravishanker, N.; Melnick, E. L.; Tsai, C. Differential geometry of ARMA models. J. Time Ser. Anal.1990, 11, 259–274. [Google Scholar]
Ravishanker, N. Differential geometry of ARFIMA processes. Commun. Stat. Theory Methods.2001, 30, 1889–1902. [Google Scholar]
Barbaresco, F. Information intrinsic geometric flows. AIP Conf. Proc.2006, 872, 211–218. [Google Scholar]
Tanaka, F.; Komaki, F. A superharmonic prior for the autoregressive process of the second order. J. Time Ser. Anal.2008, 29, 444–452. [Google Scholar]
Tanaka, F. Superharmonic priors for autoregressive models; Mathematical Engineering Technical Reports; University of Tokyo: Tokyo, Japan, 2009. [Google Scholar]
Choi, J.; Mullhaupt, A. P. Kählerian information geometry for signal processing, arXiv:1404.2006.
Komaki, F. Shrinkage priors for Bayesian prediction. Ann. Stat.2006, 34, 808–819. [Google Scholar]
Barndorff-Nielsen, O. E.; Jupp, P. E. Statistics, yokes and symplectic geometry. Annales de la faculté des sciences de Toulouse 6 série1997, 6, 389–427. [Google Scholar]
Barbaresco, F. Information geometry of covariance matrix: Cartan-Siegel homogeneous bounded domains, Mostow/Berger fibration and Fréchet median. In Matrix Information Geometry; Bhatia, R., Nielsen, F., Eds.; Springer: Berlin and Heidelberg, Germany, 2012; pp. 199–256. [Google Scholar]
Zhang, J.; Li, F. Symplectic and Kähler structures on statistical manifolds induced from divergence functions. Geom. Sci. Inf.2013, 8085, 595–603. [Google Scholar]
Barbaresco, F. Koszul information geometry and Souriau geometric temperature/capacity of Lie group thermodynamics. Entropy2014, 16, 4521–4565. [Google Scholar]
Bogert, B.; Healy, M.; Tukey, J. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking, Proceedings of the Symposium on Time Series Analysis, Brown University, Providence, RI, USA, 11–14 June 1963; pp. 209–243.
Martin, R.J. A metric for ARMA processes. IEEE Trans. Signal Process.2000, 48, 1164–1170. [Google Scholar]
Oppenheim, A. V. Superposition in a class of nonlinear systems. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1965. [Google Scholar]
Nakahara, M. Geometry, Topology and Physics; Institute of Physics Publishing: Bristol, UK and Philadelphia, PA, USA, 2003. [Google Scholar]