Geometric shrinkage priors for K\"ahlerian signal filters

We construct geometric shrinkage priors for K\"ahlerian signal filters. Based on the characteristics of K\"ahler manifolds, an efficient and robust algorithm for finding superharmonic priors which outperform the Jeffreys prior is introduced. Several ans\"atze for the Bayesian predictive priors are also suggested. In particular, the ans\"atze related to K\"ahler potential are geometrically intrinsic priors to the information manifold of which the geometry is derived from the potential. The implication of the algorithm to time series models is also provided.


Introduction
In information geometry, signal processing is one of the most important applications. In particular, an information geometric approach to various linear time series models has been also well-known [1][2][3][4][5][6][7]. The geometric description of the linear systems is not confined to the pursuit of mathematical beauty. Komaki's work [8] is in the line of developing practical tools for Bayesian inference. Using the Kullback-Leibler divergence as a risk function for estimation, he found that superharmonic shrinkage priors outperform the Jeffreys prior in the viewpoint of information theory. Better prediction in the Bayesian framework is attainable by the Komaki priors.
However, a difficult part of Komaki's idea in practice is verifying whether or not a prior function is superharmonic. In particular, when high-dimensional statistical manifolds are considered, it is technically tricky to test the superharmonicity of prior functions because Laplace-Beltrami operators on the manifolds are non-trivial. Although some superharmonic priors for the autoregressive (AR) models were found not only in the two-dimensional cases [5,7] but also in arbitrary dimensions [6], there is no clue about the Bayesian shrinkage priors of more complicated models such as the autoregressive moving average (ARMA) models, the fractionally integrated ARMA (ARFIMA) models, and any arbitrary signal filters. Additionally, generic algorithms for systematically obtaining the information shrinkage priors are not known yet.
The connection between Kähler manifolds and information geometry has been reported [4,[9][10][11][12] and the mathematical correspondence between a Kähler manifold and the information geometry of a linear system is recently revealed. It is found that the information geometry of a signal filter with a finite complex cepstrum norm is a Kähler manifold [7]. In particular, the Hermitian condition on the Kählerian information manifolds is clearly seen under conditions on the transfer function of the linear system. Moreover, many practical aspects of introducing Kähler manifolds to information geometry for signal processing were also reported in the same literature [7]. One of the benefits in the Kählerian information geometry is that the simpler form of the Laplace-Beltrami operator on the Kähler manifold is beneficial to finding the Komaki priors.
In this paper, we construct Komaki-style shrinkage priors for Kählerian signal filters. By introducing an algorithm which is based on the characteristics of Kähler manifolds, the Bayesian predictive priors outperforming the Jeffreys prior can be obtained in a more efficient and more robust way. Several prior ansätze are also suggested. Among the ansätze, the geometric shrinkage priors related to Kähler potential are intrinsic priors on the information manifold because the geometry is given by the Kähler potential. We also provide the geometric priors for the ARFIMA models where the Komaki priors have not been reported. The structure of this paper is as follows. In next section, theoretical backgrounds of Kählerian information geometry and superharmonic priors are introduced. In Section 3, an algorithm and ansätze for the geometric shrinkage priors are suggested. The implication of the algorithm to the ARFIMA models is given in Section 4. We conclude the paper in the last section.

Kählerian Filters
A linear filter with n-dimensional complex parameters ξ is characterized by a transfer function h(w; ξ) in the frequency domain w with where y and x are complex output and input signals, respectively. A spectral density function S(w; ξ) is defined as the absolute square of the transfer function S(w; ξ) = |h(w; ξ)| 2 and it is a real-valued measurable quantity.
In information geometry, it is well-known by Amari and Nagaoka [1] that the geometry of a linear system is determined by the spectral density function S(w; ξ) under the stability condition, minimum phase, and 1 2π The last condition is also known as the finite unweighted norm of the power cepstrum of a filter [13,14]. For a linear system with the spectral density function satisfying the above conditions, the metric tensor of the information geometry is given by where the partial derivatives are taken with respect to the model parameters ξ.
The metric tensor can be expressed in a complexified coordinate system and the Z-transformed transfer function. With the Z-transformation, the holomorphic transfer function can be written in the form of series expansion of z where h r is an impulse response function. The Z-transformed power spectrum is also defined in the similar way. In this case, the conditions on the transfer function for constructing information geometry are identical to the spectral density function representation except for and it is a necessary condition for the finite power cepstrum norm. The condition indicates that the Hardy norm of the logarithmic transfer function, also known as the unweighted complex cepstrum norm [14,15], is finite. The metric tensor of the geometry is given by the transfer function, where i, j run from 1 to n and gīj, gī j are the complex conjugates of g ij and g ij , respectively. After plugging the Z-transformed transfer function, Equation (1), into the metric tensor expressions, Equations (2) and (3), the metric tensor is expressed with the series expansion coefficients in z of the logarithmic transfer function by where η r is the coefficient of z −r in the series expansion of the logarithmic transfer function, also known as a complex cepstrum coefficient [15]. It is obvious that η 0 = log h 0 .
Recently, it is found by Choi and Mullhaupt [7] that the information geometry of a linear system with a finite Hardy norm of a logarithmic transfer function (or the complex cepstrum norm) is the Kähler manifold that is the Hermitian manifold with the closed Kähler two-form: g ij = gīj = 0 for the Hermitian manifold and ∂ i g jk = ∂ j g ik , ∂īg kj = ∂jg kī for the closed Kähler two-form. Additionally, the Hermitian structure can be explicitly seen in the metric tensor if and only if the impulse response function with the highest degree in z, i.e., h 0 in the unilateral transfer function case, is a constant in model parameters ξ. In this paper, for simplicity, we only consider unilateral transfer functions with non-zero h 0 and the Kähler manifolds with the explicit Hermitian conditions on the metric tensors because complex manifolds are always Hermitian manifolds [16]. In this case, the necessary and sufficient condition for being a Kähler manifold is that h 0 (ξ) is a constant in ξ [7].
According to Choi and Mullhaupt [7], the benefits of the Kählerian description are the followings. First of all, geometric objects are straightforwardly computed on a Kähler manifold. The non-trivial metric tensor component is simply derived from the following formula where K is the Kähler potential of the geometry. The Kähler potential in the information geometry of a linear filter is the square of the Hardy norm (or H 2 -norm) of the logarithmic transfer function (or the square of the complex cepstrum norm) on the unit disk D and the details of the derivation are given in the literature [7]. The non-trivial components of the Levi-Civita connection are expressed as and the other connection components are all vanishing. Notice that it is much simpler than the connection components on a non-Kähler manifold given by and it is obvious that the number of calculation steps is significantly reduced in the Kähler case. The Riemann curvature tensor of the linear system geometry is also represented in the simpler form which is given in Choi and Mullhaupt [7]. The Ricci tensor on the Kähler manifold is obtained as where G is the determinant of the metric tensor. It is evident that we can skip the calculation of the Riemann curvature tensor in order to compute the Ricci tensor on a Kähler manifold. Additionally, the α-generalization of the geometric objects is linear in α on Kähler manifolds. Since the Riemann curvature tensor on a Kähler manifold is linear in the α-connection which is α-linear, the Riemann tensor also exhibits the α-linearity which leads to the α-linear Ricci tensor and scalar curvature.
In addition to these advantages, any submanifolds of a Kähler manifold are also Kähler manifolds. If the information geometry of a given statistical model is a Kähler manifold, its submodels also have Kähler manifolds as the information geometry and all the properties of the ambient manifold are also equipped with the submanifolds.
Lastly, the Kählerian information geometry is also useful to find superharmonic priors because of the simpler Laplace-Beltrami operators on the manifolds. We will cover the details of the superharmonic priors soon.

Superharmonic Priors
For further discussions, we need to introduce the superharmonic priors suggested by Komaki [8].
When we want to find the true probability distribution p(y|ξ) based on given samples x of size N , one of the best approaches is using Bayesian predictive density p π (y|x (N ) ) with a prior π(ξ): The superharmonic priors π I are derived from the difference between two risk functions with respect to the true probability density, one from the Jeffreys prior and another from the superharmonic prior: where D KL is the Kullback-Leibler divergence and π J is the Jeffreys prior which is the volume form of the statistical manifold. Each risk function indicates how far a given Bayesian predictive density is from the true distribution in the Kullback-Leibler divergence in average. Sine better priors are obtained from smaller risk functions, the priors outperforming the Jeffreys prior make the above expression greater than zero. Since the first term on the right-hand side is non-negative, the risk function of the Komaki prior is decreased with respect to the risk function of the Jeffreys prior if a prior function ψ = π I /π J is superharmonic. If a superharmonic prior function ψ can be found, it is possible to do better Bayesian prediction in the viewpoint of information theory. In the same paper, Komaki also pointed out that shrinkage priors are information-theoretically more improved in prediction than the Jeffreys prior if and only if the square root of a prior function is superharmonic. Since Komaki's paper [8], several superharmonic priors for the AR models have been found [5][6][7]. The Komaki prior for the AR(2) model in the pole coordinates [5] is given by where ξ i is a pole of the transfer function. Tanaka [6] generalized the two-dimensional case to superharmonic priors for the AR model in an arbitrary dimension p. The shrinkage prior function for the AR(p) model is in the form of where ξ i is a pole of the AR transfer function.
As mentioned before, one of the advantages in the Kählerian description is that finding the Komaki prior functions becomes more efficient than those in non-Kähler description because the Laplace-Beltrami operators on Kähler manifolds are in the simpler forms. For a differentiable function ψ, the Laplace-Beltrami operator in the Kähler geometry is represented with ∆ψ = 2g ij ∂ i ∂jψ.
Meanwhile, the Laplace-Beltrami operator on a non-Kähler manifold is expressed as where G is the determinant of the metric tensor. It is obvious that additional calculations for the latter two terms in the right-hand side are indispensable in the non-Kähler cases.
With the computational benefits on the Kählerian information manifolds, the superharmonic prior function for the Kähler-AR(2) model [7] is found where ξ i is the i-th pole of the transfer function andξ i is the complex conjugate of ξ i . However, its generalization to any arbitrary dimensions has been unknown. Moreover, the Komaki priors for the ARMA models and the ARFIMA models are not reported yet.

Geometric Shrinkage Priors
As shown in the previous section, Kähler manifolds in information geometry are useful in order to obtain the superharmonic priors. In this section, we introduce an algorithm to find the geometric shrinkage priors by using the properties of Kähler geometry. Moreover, several ansätze for the priors are suggested.
According to Lemma 1, superharmonic functions are easily obtained from subharmonic or harmonic functions by simply plugging the (sub-)harmonic functions as κ into Lemma 1.
Proof. Since this is a special case of Lemma 1, the proof is obvious.
Although any (sub-)harmonic function κ can be used for constructing superharmonic priors, restriction on κ makes finding the ansätze of the geometric priors easier. From now on, upper-bounded functions are only our concerns. Additionally, we assume that κ and u * are real. With these assumptions, it is possible to set u * as a constant greater than the upper bound of κ in order for τ to be positive.
Ansätze for Ψ can be found in the following example.
Proof. We only cover a subharmonic case for κ here and it is also straightforward for the harmonic case. First of all, Ψ 1 and Ψ 2 are all positive. For Ψ 1 , it is easy to verify the followings: The similar calculation is repeated for Ψ 2 : Both functions Ψ 1 and Ψ 2 satisfy the conditions for Ψ in Lemma 1.
It is also possible to find ansätze for upper-bounded subharmonic κ. The following functions are candidates for upper-bounded and subharmonic κ.
Example 2. For positive real numbers a r and b i , the following subharmonic functions are candidates for κ in the cases that those are upper-bounded: Proof. Let us assume that the ansätze are upper-bounded in given domains. For κ 1 , it is easy to show that the Kähler potential K is subharmonic: The proof for subharmonicity of κ 2 is as follows: The subharmonicity of κ 3 is tested by If the upper-boundedness is satisfied, the above subharmonic functions are ansätze for κ.
Superharmonic prior functions on the Kähler manifolds are efficiently constructed from the following algorithm which exploits Theorem 1 and the ansätze for Ψ and κ. When we find positive and superharmonic functions, it is automatically the Komaki-style prior functions as usual. If positive, upper-bounded, and (sub-)harmonic functions are found, those functions are plugged into Theorem 1 in order to obtain superharmonic prior functions. Multiplying the Jeffreys prior by the superharmonic prior functions, we finally acquire the geometric shrinkage priors. Additionally, since the ansätze are already given, there is no extra cost to find the Komaki prior functions except for verifying whether or not the information geometry is a Kähler manifold. Comparing with the literature on the Komaki priors of the time series models [5][6][7], obtaining the geometric priors on the Kähler manifolds becomes more efficient and more robust.

Example: ARFIMA Models
The ARFIMA model is the generalization of the ARMA model with a fractional differencing parameter in order to model the long memory process. The transfer function of the ARFIMA(p, d, q) model with parameters ξ = (ξ −1 , ξ 0 , ξ 1 , · · · , ξ p+q ) = (σ, d, λ 1 , · · · , λ p , µ 1 , · · · , µ q ) is given by where d is the differencing parameter and µ i , λ i , σ are a pole, a root, and a gain in the ARMA model, respectively. It is noteworthy that the transfer function of the ARFIMA model is decomposed into the ARMA model part and the fractionally integration part. Additionally, every poles and roots of the linear system are located inside the unit disk, i.e., |λ i | < 1 for i = 1, · · · , p and |µ i | < 1 for i = 1, · · · , q. Similar to the ARMA case [7], the full geometry of the ARFIMA model is a Kähler manifold and the submanifold of a constant gain σ is also Kähler geometry. This submanifold also exhibits the explicit Hermitian condition on the metric tensor. It is easy to cross-check the Hermitian structure by fixing h 0 = 1 up to the gain of the signal filter. We will work on this submanifold.
Since the information geometry of the ARFIMA model is a Kähler manifold, the Kähler potential of the ARFIMA geometry is obtained from the square of the Hardy norm of the logarithmic transfer function (or the square of the complex cepstrum norm), Equation (5), represented with It is obvious that the Kähler potential for the ARFIMA model, Equation (8), is reducible to the Kähler potential of the ARMA geometry by setting d = 0. It is easy to verify that the Kähler potential of the ARFIMA geometry is upper-bounded by (d + p + q) 2 π 2 6 . By using Equation (4), the metric tensor of the Kähler geometry is simply derived from the Kähler potential. The metric tensor of the Kähler-ARFIMA geometry is given by and it is easy to show that the metric tensor contains the pure ARMA metric. The metric tensor is also in the similar form to the ARFIMA geometry in non-complexified coordinates [3]. The metric tensor indicates that the ARMA geometry is embedded in the ARFIMA geometry and corresponds to the submanifold of the ARFIMA manifold. The ARMA part of the metric tensor is the same metric with the Kähler-ARMA geometry in Choi and Mullhaupt [7]. In addition to that, we can cross-check the fact that the ARMA geometry is also a Kähler manifold based on a property of a Kähler manifold that a submanifold of the Kähler geometry is Kähler.
Other geometric objects can be derived from the metric tensor. For example, the non-trivial components of the 0-connection are given by Equation (6). It is noteworthy that any connection components with the d-coordinate in the first two indices of the connection are trivially zero and the others might not be vanishing. Similar to the 0-connection, the Ricci tensor components along the fractionally integrated direction are also zero because there is no dependence on d in the metric tensor. Considering the Schur complement, the non-vanishing Ricci tensor components are decomposed into the Ricci tensor from the pure ARMA part and the term from the mixing between the ARMA part and the fractionally integrated (FI) part: where i and j are not along the d-coordinate.
It is the time to be back to the geometric shrinkage priors. Since the Kähler potential of a given ARFIMA model is upper-bounded by a constant u * = (d + p + q) 2 π 2 6 , the intrinsic priors on the Kähler manifold can be found as it is proven in the previous section. By using the algorithm and the ansätze related to the Kähler potential, some geometric shrinkage prior functions for the ARFIMA model are constructed as where 0 < a ≤ 1. It is also noteworthy that when d = 0 in the Kähler potential, superharmonic priors of the ARMA (or AR/MA) models are obtained and finding the priors becomes much simpler than the literature on the Komaki priors of the AR models [5][6][7]. Similarly, κ 2 and κ 3 are also utilized for the superharmonic prior function ansätze in the ARFIMA models because the both functions are upper-bounded on the ARFIMA manifold. Moreover, if we set d = 0 for κ 2 or b 0 = 0 for κ 3 , the ansätze for the ARFIMA models are reducible to the Komaki priors of the ARMA models.

Conclusion
In this paper, we build up an algorithm and ansätze for the geometric shrinkage priors of Kählerian signal filters. By using the properties of Kähler manifolds, an algorithm to find the Komaki priors is constructed and ansätze for the prior functions are suggested. Additionally, some ansätze associated with the Kähler potential are geometrically intrinsic to Kählerian information manifolds because the geometry is derived from the Kähler potential which is the square of the complex cepstrum norm of a linear system.
Comparing with the literature on the Komaki priors of the time series models, verification of the geometric priors is much easier on the Kähler manifold and it is also possible to acquire the geometric shrinkage priors for highly complicated models in the more efficient and robust way. For example, Bayesian predictive priors for the ARFIMA model are obtained from the algorithm and ansätze for the prior functions. The shrinkage priors of the ARMA cases are simply found from the geometric shrinkage priors of the ARFIMA models by using the property of submanifolds in the Kähler geometry.