Intrinsic Losses Based on Information Geometry and Their Applications

One main interest of information geometry is to study the properties of statistical models that do not depend on the coordinate systems or model parametrization; thus, it may serve as an analytic tool for intrinsic inference in statistics. In this paper, under the framework of Riemannian geometry and dual geometry, we revisit two commonly-used intrinsic losses which are respectively given by the squared Rao distance and the symmetrized Kullback–Leibler divergence (or Jeffreys divergence). For an exponential family endowed with the Fisher metric and α-connections, the two loss functions are uniformly described as the energy difference along an α-geodesic path, for some α ∈ {−1, 0, 1}. Subsequently, the two intrinsic losses are utilized to develop Bayesian analyses of covariance matrix estimation and range-spread target detection. We provide an intrinsically unbiased covariance estimator, which is verified to be asymptotically efficient in terms of the intrinsic mean square error. The decision rules deduced by the intrinsic Bayesian criterion provide a geometrical justification for the constant false alarm rate detector based on generalized likelihood ratio principle.


Introduction
In Bayesian analysis, the choices of a particular loss function and a featured type of priors strongly influence the resulting inference.From the standpoint of purely inference, the final results are typically required to be independent of the way that the model has been parameterized.Thus, in order to produce a Bayes estimator or action which only depends on the model assumed and the data observed, an intrinsic loss and a noninformative (or uninformative) prior are demanded.In an early paper [1], the intrinsic losses are also called the noninformative losses, since they are considered as an analogous determination of the noninformative priors.The reference [1] describes an intrinsic loss as the automated derivation from the sampling distribution without subjective inputs.Whereafter, the entropy and Hellinger losses are suggested there as two appropriate distances between distributions, and their properties are studied especially for exponential families.Some latter works by Barcelona et al. regard the additional requirements as reasonable that the intrinsic losses need be invariant under reduction to sufficient statistics [2].A symmetric version of Kullback-Leibler (KL) divergence (also named "the intrinsic discrepancy") as such an intrinsic loss is highly recommended, and has been widely applied to develop intrinsic Bayesian analyses of hypothesis testing [3,4], point estimation [4][5][6][7], and interval estimation [2,4,7].
The invariance criteria achieved by the intrinsic losses are indeed consistent with one main topic of information geometry, which concentrates on excavating the properties of statistical manifolds that are preserved over coordinate system transformations [8].Under certain regularity conditions, each parametric statistical model possesses a Riemannian manifold structure given by the Fisher metric, with the parametrization identical to a coordinate system [9].The parameters are merely labels for probability distributions, just as the coordinates for geometric objects.Moreover, the manifold structure supplied by the Fisher metric and α-connections are proven to be invariant under transformations of coordinates and reduction by sufficiency [10].Following this line, a concept is intrinsic if it has a well-defined geometrical meaning.The Riemannian distance in this case-also named the Rao distance [11]-is taken as an intrinsic measure for the dissimilarity between two probability distributions.The reference [12] regards the squared Rao distance between the models labelled by the actual value of the parameter and its estimate as the most natural intrinsic loss, which has been utilized to seek intrinsic Bayes estimators in some concrete univariate examples.In the wider literature, statisticians also apply the Rao distance measure to stimulate the intrinsic study of nonlinear filtering [13], estimation fusion [14,15], and target tracking [16].Besides, the intrinsic versions of the bias, mean square error (MSE) and Cramér-Rao bound (CRB) are put forward [17,18], which help conduct a systematic analysis for intrinsic estimation theory.
From the above, we can conclude that the Rao distance and the KL divergence are widely accepted as foundation quantities for the intrinsic losses.The former is well-defined on the statistical manifolds with consideration of its usual Riemannian structure, while the later is extremely famous in information theory and is also associated with the canonical divergence in Amari's dual geometries [9,19].In this paper, we revisit two intrinsic losses that are given by the squared Rao distance and the symmetric KL divergence (also called the Jeffreys divergence), by integrating them in the framework of information geometry.For the exponential families, they are shown to be uniformly described as the energy difference along an α-geodesic path on the statistical manifold, with some α ∈ {−1, 0, 1}.Additionally, the Jeffreys prior-as one of the most popular noninformative priors-corresponds to the Riemannian volume in this setting [20].The elucidation of their geometrical meanings contributes to deepen our understanding and capabilities in intrinsic Bayesian analysis.Subsequently, we apply these two intrinsic losses to develop Bayesian approaches to covariance matrix estimation and range-spread target detection, which are both hot issues in radar signal processing.The sample covariance matrix (the maximum likelihood estimator (MLE) using a set of zero-mean Gaussian samples) is proven to be intrinsically biased [18].We provide a Bayesian approach to estimate the scale factor of the sample covariance matrix, which leads to an intrinsically unbiased and asymptotically efficient covariance estimator.The detection of range-spread target is, in essence, a decision problem.Intrinsic analysis supplies a novel way to derive a consistent decision rule under different coordinate systems of measurement.The produced detectors are equivalent to a classical result based on generalized likelihood ratio principle, which will benefit an intuitive interpretation about the geometrical nature of the latter.
In the next section, we review some basic facts about information geometry and exponential family.In Section 3, we present the intrinsic Bayesian approaches to point estimation and hypothesis testing.In Section 4, some applications to covariance matrix estimation and range-spread target detection are studied in detail.Section 5 concludes.

The Fisher Metric and the α-Connections
Assume a statistical model S = {p(x|ξ) : ξ ∈ Ξ} to describe the probabilistic behavior of a random variable x, with the parameter space Ξ being an open set of R n .Under certain regularity conditions, S can be regarded as a differentiable manifold, with ξ = [ξ 1 , . . ., ξ n ] being a coordinate system.The manifold S carries the Riemannian structure induced by the Fisher information matrix where E ξ signifies the expectation with respect to p(x|ξ).In this geometric setting, G plays the key role of a metric tensor [11], called the Fisher metric.Then, the infinitesimal squared distance between two closely-spaced distributions p(x|ξ) and p(x|ξ + dξ) is defined as [21] g ij (ξ) dξ i dξ j .
For a smooth manifold, the notion of affine connections permits a covariant differential calculus on its tangent bundle.A statistical manifold S possesses a one-parameter family of affine connections related to the Fisher metric, named the α-connections (see [9,22]).For an arbitrary real number α, the α-connection ∇ (α) can be given by the Christoffel symbols of the first kind for i, j, k = 1, . . ., n. Specially, the 0-connection ∇ (0) is exactly the Riemannian (or Levi-Civita) connection with respect to the Fisher metric, and the exponential connection ∇ (1) and the mixture connection ∇ (−1) also have good applications in statistical analysis (e.g., [23,24]).As detailed in [9], the geometric structure given by the Fisher metric and the α-connections obeys the following two invariance principles: (1) It is invariant under one-to-one reparameterizations.
(2) It is invariant under reduction to sufficient statistics.

Geometric Structure of Exponential Family
An exponential family is a statistical model S consisting of probability densities of the form where C(x) and F i (x), i = 1, . . ., n, are real-valued functions of x, θ = [θ 1 , . . ., θ n ] are the so-called natural parameters, and ψ(θ) corresponds to the normalization constant.Besides, the n + 1 functions 1, F 1 (x), . . ., F n (x) are generally required to be linearly independent [8].The exponential families include many common statistical models, such as Gaussian, Poisson, Bernoulli, Gamma, and Dirichlet distributions.
For an exponential family S, let us investigate its geometric structure with respect to the Fisher metric g and α-connection ∇ (α) .From Equation (3), we can obtain Note that in information geometry, the differentiation and integration of p(x|θ) are customarily assumed to be interchangeable [9].We can easily verify that an exponential family satisfies this assumption by reference to [25].Thus from Equations ( 1) and (4), it is possible to write the components of g as Further computations by Equation (2) give the expressions for ∇ (α) .Especially, we have Γ ij,k = 0 for i, j, k = 1, . . ., n.
In essence, the differential geometry with respect to the Fisher metric and the 0-connection coincides with the usual Riemannian geometry.For all exponential families, it is difficult to derive a general theory of their Riemannian geometric structures.Individually, however, the properties of univariate and multivariate Gaussian distributions are well-studied under the framework of Riemannian geometry in [26].
Amari and Nagaoka [9,19] propose the analysis of the geometric structures of statistical models through duality.Typically, an exponential family S has a dually flat structure (g, ∇ (±1) ).That is to say, S is flat with respect to the dual connections ∇ (1) and ∇ (−1) .On the dual flat space (S, g, ∇ (±1) ), the natural parameters θ constitute the 1-affine coordinate system, while the (−1)-affine coordinate system can be given by the expectation parameters η = [η 1 , . . ., η n ] = E[x].Furthermore, we can prove that the two potential functions provided by ψ(θ) and φ(η) = θ • η − ψ(θ) satisfy where g ij is the (i, j)-th entry of the inverse Fisher information matrix G −1 , θ q are the 1-affine coordinates of q ∈ S, η p are the (−1)-affine coordinates of p ∈ S, and D(p, q) denotes the KL divergence from p to q.

α-Geodesics
In the presence of an affine connection, geodesics are defined to be curves whose tangent vectors remain parallel if they are transported along it.Specifically, an α-geodesic γ(t) is characterized by the equation where γ denotes the derivative with respect to t.
It is well-known in differential geometry that the (locally) shortest curve between two points of the manifold (if it exists) is a 0-geodesic (namely the geodesic in usual Riemannian geometry; e.g., [27]).Usually, a description is used that the 0-geodesic has "constant speed", since its velocity fields have constant Riemannian norm.Under certain boundary conditions, when α = 0, an explicit solution to Equation (8) for univariate or multivariate Gaussian distributions has been derived in [26,28,29].However, it seems complicated to give out a general closed form of the 0-geodesics for all exponential families.
From the viewpoint of flatness, the manifold S of exponential family is dually flat with respect to ∇ (±1) .We can know from [19] that any (−1)or 1-geodesic on S is a straight line with respect to the corresponding affine coordinate system.Thus, in terms of the θ and η coordinates, the ±1-geodesics γ (1) p,q and γ (−1) p,q connecting any p, q ∈ S can be expressed by with the parameter t ∈ [0, 1].Note that in Equation ( 9), the velocity fields θ, η are constant along t, but γ (1) p,q and γ (−1) p,q do not have constant speeds.
Example 1 (Univariate Gaussian Distribution).The probability density of a univariate Gaussian distribution with mean µ and variance σ 2 can be expressed as Further, we can compute the expectation parameters η , under (a) the 1-affine coordinate system θ and (b) the (−1)-affine coordinate system η.

The Length and Energy of a Curve
The length of a piecewise smooth curve γ(t) = [ξ 1 (t), . . ., ξ n (t)], t ∈ [a, b] is defined as In some occasions, it may be more convenient to work with another related quantity termed the "energy function", which is given by the integration of the Fisher information along the curve: Minimizing energy turns out to be equivalent to minimizing length, and both lead to the 0-geodesics on the manifold.
Remark 2. On a manifold, both the length and energy of a curve γ(t) do not change under different coordinate systems or model parameterizations, but the latter will depend on the curve parametrization t → γ(t) [30].
Now we discuss the lengths and energies of the 0-and ±1-geodesics on the manifold S of exponential family.If there exists a unique 0-geodesic connecting p, q ∈ S, the length of this 0-geodesic segment equals the well-known Rao distance between p and q [26], written as ρ(p, q).From Remark 2, the energy of a curve is meaningless without specifying its parametrization.What we mainly consider in this paper are the 0-and ±1 geodesics parameterized by [0, 1].Since the 0-geodesics have constant speed, it means that the integrands in Equations ( 10) and (11) are constant over t.Thus, given a 0-geodesic γ (0) p,q : [0, 1] → S connecting p and q, its length and energy have the following relationship [30]: The ±1-geodesics on S have explicit expressions, shown in Equation (9).Furthermore, the theorem given below describes their energy functions.Theorem 3. The energies of the ±1-geodesics γ (±1) p,q given in Equation ( 9) are equal to the Jeffreys divergence J(p, q) between p and q.
Proof.By using Equation ( 5), the energy of γ (1) p,q connecting p = θ(0) and q = θ( 1) is where the last equality above follows from Equations ( 6) and ( 9).On the other side, according to the expression of D(p, q) in Equation ( 7), we have Then, E γ (1) p,q = J(p, q).Similarly, we can prove the other half of this theorem.
As an example, on the manifold of univariate Gaussian distributions, Figure 2 shows the energy differences along the α-geodesics parameterized by [0, 1], for α = −1, 0, 1.We have known that the 0-geodesic path follows the minimum energy variation, which is also demonstrated in Figure 2. Below is a straightforward corollary of Theorem 3. Corollary 4. If p and q belong to the same exponential family, then J(p, q) ≥ ρ 2 (p, q).

Two Intrinsic Losses
Suppose that the observed data x are generated by p(x|ξ), for some ξ ∈ Ξ.Let ξ a be the actual (unknown) value of the parameter, and ξ g ∈ Ξ be a given value which may represent an estimator or a hypothesis.For the problems of statistical inference, we define a loss function l(ξ a , ξ g ), aiming at measuring the consequences of estimating ξ a by ξ g in statistical estimation [5] or judging the compatibility of ξ g with the observations x in hypothesis testing [3].
Many conventional loss functions (such as the squared error loss, the zero-one loss) are defined to compare ξ a and ξ g in the parameter space, while an intrinsic loss function tends to directly measure the dissimilarity between p(x|ξ a ) and p(x|ξ g ).As customary, the intrinsic losses are required to be invariant under one-to-one transformations of either x or ξ (e.g.[2,5]).Let S = {p(x|ξ) : ξ ∈ Ξ} form an n-dimensional manifold.From the viewpoint of information geometry, the aforementioned criteria are satisfied if a loss function has a well-defined geometrical meaning.Therefore, based on the contents of Section 2.4, we will consider the following two intrinsic losses: (i) Intrinsic loss based on the squared Rao distance (hereafter referred to as the Rao loss): which stands for the energy difference along a 0-geodesic connecting ξ a and ξ g on S and parameterized by [0, 1].(ii) Intrinsic loss based on the Jeffreys divergence (hereafter referred to as the Jeffreys loss): which stands for the energy difference along a (−1)or 1-geodesic connecting ξ a and ξ g on S and parameterized by [0, 1] if S is an exponential family.
Since the Jeffreys divergence is a symmetrized KL divergence, it has all the properties of a metric as defined in topology except the triangle inequality property, and is thus not termed a distance.In recent years, this quantity has sometimes been used for model selection [31,32].There exists another symmetrized version of KL divergence proposed in [2,5] for intrinsic analysis, which takes a minimum over D(ξ a , ξ g ), D(ξ g , ξ a ) and thus copes better with the case when p(x|ξ g ) and p(x|ξ g ) have nested supports.Even so, we prefer to adopt the Jeffreys loss l J in this paper since it has a better-understood geometrical interpretation.

Priors
Apart from the loss function, an appropriate choice of the prior distribution plays an equally strong role for Bayesian analysis.A particular type of priors have been developed to cope with noninformative settings, where the term "noninformative" expresses that a prior distribution in the Bayes theory is expected to have minimal effect on the posterior inference [5].A common noninformative prior for the Bayesian paradigm is the Jeffreys prior, which is proportional to the Riemannian volume corresponding to the Fisher metric [20].As we have learned from the geometric theory, this is an intrinsic concept and is thus not dependent on the model parametrizations.
In many early works, the Jeffreys priors are not recommended for multiparameter cases.Bernardo [33] has initiated the reference priors in Bayesian inference (also known as objective Bayesian inference), which is widely accepted for various multiparameter Bayesian problems.However, in most multidimensional occasions, the analytical evaluations related to an inference prior appear to be quite cumbersome.Thus, some numerical algorithms are designed to do the computations in objective Bayesian inference (e.g., [34,35]), which seems unfavorable for the theoretical derivations of this paper.Additionally, in Section 4, the applications of Jeffreys priors yield an appropriate scale of the eigenvalues of sample covariance matrix, and also reproduce the classic constant false alarm ratio (CFAR) detector in radar detection theory.Hence, we believe that the Jeffreys prior may give a different consequence in intrinsic Bayesian analysis.

Intrinsic Bayesian Analysis
Let p(ξ|x) ∝ p(x|ξ)p(ξ) be the posterior distribution with respect to the prior p(ξ).From a Bayesian viewpoint, the corresponding posterior expected loss is Considering the invariance criteria, we take the loss function as l R or l J and use the Jeffreys prior π J .This paper will discuss two aspects of intrinsic Bayesian inference: point estimation and hypothesis testing.
As formulated in [3], a hypothesis testing problem is to decide how the observed data x are compatible with the null hypothesis.Specifically, when the null hypothesis contains only one value ξ 0 , an intrinsic Bayesian approach to this hypothesis test is based on the positive statistic r(ξ 0 , x), which suggests to reject the null hypothesis if r(ξ 0 , x) > γ with some threshold γ; for the composed null hypothesis case, it is suggested to take min ξ 0 ∈Ξ 0 r(ξ 0 , x) as the test statistic, where Ξ 0 denotes the null parameter space.
When we come to deal with a point estimation problem, the Bayes estimator of ξ is obtained by minimizing the posterior expected loss function: from which the obtained estimator ξ(x) is invariant under invertible transformations of either x or ξ.Besides, the property of ξ(x) can be further analyzed using the intrinsic estimation theory.
Before finishing this subsection, it is convenient to introduce some basic concepts of intrinsic estimation theory.Fix a ξ ∈ Ξ and let T ξ S be the tangent space at ξ. Associated with a considered connection ∇, the corresponding geodesic curve with starting point ξ and initial direction v ∈ T ξ S is denoted by γ v (t), t ∈ [0, 1].Such a curve exists when v belongs to a small neighborhood of the origin at T ξ S. In this case, the exponential map is defined as Exp ξ (v) = γ v (1).Given an estimator ξ(x), the estimator vector field is induced on S through the inverse of exponential map: A ξ (x) = Exp −1 ξ ( ξ(x)).Further, the bias vector field is defined as If B ξ = 0 for any ξ ∈ Ξ, the estimator ξ(x) is called intrinsically unbiased.Obviously, these definitions above are dependent on the specific choice of ∇ [18].When a flat connection is considered, an estimator will be intrinsically unbiased if and only if it is unbiased under the corresponding affine coordinate system.Generally speaking, however, the notion of intrinsic unbiasedness is widely acknowledged only in the Riemannian case (namely, when ∇ is the Riemannian connection).Thus, in this paper, when we say that an estimator is intrinsically unbiased, we mean that this is true with regard to the 0-connection.The intrinsic MSE is defined by the mean square of the Rao distance where, in the Bayesian sense, E denotes the expectation taken over both x and ξ.An intrinsic version of the CRB gives a lower bound on the intrinsic MSE performance of any intrinsically unbiased or biased estimator.The relevant developments can be found in [17,18,36].

Covariance Estimation
Let x 1 , . . ., x n be random samples of size n from a zero-mean p-variate Gaussian distribution with unknown covariance matrix Σ.Consider the Jeffreys prior distribution for Σ ( [37], p. 426), A sufficient statistic is provided by which is recognized as the sample covariance matrix of the data set [38].Assume that n > p; then, S is positive definite with probability one.Thus, the corresponding posterior distribution based on the Jeffreys prior is easily found to be which follows an inverse Wishart distribution with n degrees of freedom and scale matrix nS.Now we restrict to consider a family of covariance estimators having the form Σ g = aS, a ∈ (0, +∞), which contain the MLE S. The Rao distance between Σ 1 and while the Jeffreys divergence is [40] Taking the intrinsic losses into account, we can obtain an intrinsic estimator by minimization of the posterior expected Rao loss function: where E Σ|S signifies the expectation with respect to π(Σ|S).By Equation ( 15), we have where λ i , i = 1, . . ., p denote the p eigenvalues of the matrix Σ −1 S. In fact, it proves quite difficult to directly solve the conditional expectation in Equation ( 16) as a closed form.However, if we let c = log a, it is easy to show that the objective function in Equation ( 16) is a strictly convex function with respect to c. Thus, we can resort to the Lagrange equation to seek the minimum point.Write the posterior expected Rao loss function as Then, let dr By Leibniz's rule [41], we can interchange the integral and differential operators, obtaining Since Σ|S ∼ Inverse Wishart(n, nS), S S|, we can calculate the conditional expectation in Equation ( 17) by the law of log-determinant of a Wishart matrix [42]: where ψ(•) is the well-known digamma function defined as ψ(x) = d dx log Γ(x).Thus, from Equation ( 17), the intrinsic covariance estimator based on the Rao loss is ΣR = âR S, with Alternatively, if we consider the Jeffreys loss function l J , then It is easy to calculate that E Σ|S tr(Σ −1 S) = p, and when n > p + 1, Therefore, the intrinsic covariance estimator based on the Jeffreys loss is ΣJ = âJ S, with âJ = n n − p − 1 .
In Table 1, the scale factors âR and âJ are evaluated for various values of n and p = 10.We now come to examine the bias and efficiency of these two intrinsic estimators.Proposition 5. Let W ∼ Wishart(v, I p ) with v ≥ p, then E[log k (W )] = mI p , where m is a constant decided by p, k, and v.
Proof.For a p × p unitary matrix U, the random matrix UWU T follows the same distribution with W. Thus, Recall the important fact that log(A −1 BA) = A −1 log(B)A, where A and B are two nonsingular matrices.Thus, a further step yields Note that the equality above holds for an arbitrary unitary matrix U; then, we can conclude that E log k (W ) must be a scalar multiple of the identity matrix.Theorem 6.The covariance estimator ΣR given by Equation ( 18) is intrinsically unbiased.
Proof.For a p-dimensional positive-definite matrix X, the inverse of the exponential map defined on the manifold of positive definite matrices is given by [43] exp Thus, we have Generally, it is difficult to directly solve the expectation of the logarithm of a Wishart matrix.However, since nΣ − 1 2 SΣ − 1 2 ∼ Wishart(n, I p ), by Proposition 5, the expectation in Equation ( 19) has this form with a certain constant m.In addition, since we can obtain Substituting Equations ( 18), ( 20) and ( 21) into Equation ( 19), we have E Σ exp −1 Σ ( ΣR ) = 0, which indicates that the covariance estimator ΣR is intrinsically unbiased.
Given a real number a > 0, the posterior expected Rao loss of aS can be computed by 2 ) .
Using Proposition 5 again, we know that the right-hand side of the second equality above is constant on the sample space.Thus, the posterior expected Rao losses of ΣR , ΣJ , and S are equal to their Bayes risks under Rao loss that average the Rao loss functions over both sample space and parameter space.Similar consequences can be deduced for the posterior expected Jeffreys losses of ΣR , ΣJ , S, and their Bayes risks under Jeffreys loss.It is still difficult to seek an explicit expression for the posterior expected Rao loss.Hence, a Monte Carlo simulation is used with 1000 trials to calculate the Bayes risks under Rao loss.The comparisons are illustrated in Figure 3.In Figure 4, we compare the intrinsic MSEs (see Equation ( 14)) of the considered estimators with an intrinsic version of the CRB given in ( [18], Theorem 4), which serves as a lower bound for any intrinsically unbiased covariance estimator.In fact, for our case, the intrinsic MSEs of ΣR , ΣJ and S are equal to their Bayes risks under Rao loss.Recalling that the MLE S is not intrinsically unbiased but always asymptotically efficient [18], we can conclude that the Bayes estimator ΣR under Rao loss is intrinsically unbiased and also asymptotically efficient.This property is also demonstrated in Figure 4.

Range-Spread Target Detection
The target detection in the background of Gaussian noise is a basic problem in radar signal processing.Suppose that an unknown target is spatially distributed across H range cells.Data collected from these H cells-called the primary data-consist of the possible target echoes plus Gaussian noises.As in [44], we assume that a set of secondary data coming from K cells arranged around the target are available, which serve as training samples for noise covariance estimation.Here the homogeneous noise environment is considered, which means that the secondary data are free of signal components and share the same distribution as the noise-only part of the primary data.Therefore, the considered detection problem can be formulated as [44] H 0 : z i = w i , i = 1, . . ., H + K, H 1 : where p i , i = 1, . . ., H denote the N-dimensional unknown signal vectors, and w i , i = 1, . . ., H + K, are independent and identically distributed Gaussian noises with mean zero and unknown covariance matrix M. For the brevity of notation, let us introduce the matrix forms Denote Z = [Z p , Z s ] and P = [P, 0], then the matrix Z of measurements is Gaussian distributed with mean P and covariance I H+K ⊗ M. In the absence of target, we have P = 0 under the null hypothesis.Therefore, the Rao distance between the assumed model and the null model is [45] ρ [ P, I H+K ⊗ M], [0, We shall adopt again the Jeffreys prior distribution for parameters (P, M) [35]: The joint density for Z is proportional to Therefore, the posterior expected Rao loss of determining the null hypothesis is derived as follows: E P,M|Z l R [ P, I H+K ⊗ M], [0, Hence, the decision rule for the compatibility of the null hypothesis with the observed data Z can be given as Reject H 0 if and only if where the threshold γ is chosen by the required false alarm probability.In fact, this result reproduces the two-step generalized likelihood ratio test [46] or Wald test [47], which is verified to have CFAR property [46].Besides, since in this case, we can derive an equivalent decision rule using the Jeffreys loss.Finally, as a conclusion of this section, we will conduct a discussion about the relationship between the geometric invariance principle and the CFAR property.If we transform the random variables x to y without altering the structures of statistical model and testing problem, it is equivalent to derive a transformation on the parameter space Ξ.Let a decision statistic be an intrinsic concept in the geometrical meaning.Thus, this statistic is invariant by changing x to y, which indicates that it maintains a CFAR behavior over an orbit of the parameter space.

Conclusions
In this paper, the Rao loss and Jeffreys loss-which are, respectively, based on the squared Rao distance and the Jeffreys divergence-are uniformly considered in the framework of information geometry.We have elucidated their geometrical meaning based on some results of Riemannian geometry and dualistic geometry.In particular, on the manifold of an exponential family, the Rao loss and Jeffreys loss are essentially the energy differences along α-geodesic paths parameterized by [0, 1], for α = −1, 0, 1.Based on this, they certainly enjoy the invariance properties under the one-to-one transformations of random variables and model parameters.Subsequently, these two intrinsic loss functions are unitized to develop intrinsic Bayesian analysis of covariance estimation and range-spread target detection.We use the intrinsic losses to derive the scale factor of sample covariance matrix, which leads to an intrinsically unbiased and asymptomatically efficient covariance estimator.On the other hand, the detectors provided by the posterior expected intrinsic losses have been proven to coincide with the classic CFAR detector in radar detection theory.In this respect, it seems to be a novel but promising approach to derive a detector having CFAR property though intrinsic analysis.

Table 1 .
The scale factors of sample covariance matrix for some values of n and p = 10.