Information Geometry for Covariance Estimation in Heterogeneous Clutter with Total Bregman Divergence

This paper presents a covariance matrix estimation method based on information geometry in a heterogeneous clutter. In particular, the problem of covariance estimation is reformulated as the computation of geometric median for covariance matrices estimated by the secondary data set. A new class of total Bregman divergence is presented on the Riemanian manifold of Hermitian positive-definite (HPD) matrix, which is the foundation of information geometry. On the basis of this divergence, total Bregman divergence medians are derived instead of the sample covariance matrix (SCM) of the secondary data. Unlike the SCM, resorting to the knowledge of statistical characteristics of the sample data, the geometric structure of matrix space is considered in our proposed estimators, and then the performance can be improved in a heterogeneous clutter. At the analysis stage, numerical results are given to validate the detection performance of an adaptive normalized matched filter with our estimator compared with existing alternatives.


Introduction
The estimation of the disturbance covariance matrix is an important subject in the field of advanced radar signal processing. Many algorithms manipulate the covariance matrix of sample data, such as array signal processing [1,2], multichannel signal processing [3,4], adaptive radar detection [5][6][7], and space-time adaptive processing [8][9][10]. The commonly used estimator is the sample covariance matrix (SCM), which is often derived from the maximum-likelihood (ML) of K N-dimensional secondary data. A condition is assumed that these K secondary data are independent and identical distributed zero-mean complex circular Gaussian vectors. Usually, the solution of this ML exists when the amount of secondary data K is greater than the matrix dimension N (K ≥ N). In particular, it can achieve a good performance for K ≥ 2N [11]. Unfortunately, in real heterogeneous clutter, the statistical distribution of the whole environment is very difficult to obtain, since the secondary data could be contaminated by power variations, clutter discretes, and/or outliers. Therefore, it is very necessary and meaningful to obtain the estimation of disturbance matrix that is not relying on the statistical characterization of the whole environment.
Many covariance estimation algorithms derived from the geometry of matrix space, not resorting to the statistical characterization of sample data, are reported in the literature. For instance, the Riemannian mean and median of covariance matrices is proposed to design the radar target detector [12][13][14][15][16]. In [17][18][19][20], the median is used for covariance estimation in many radar processing applications. The Riemannian mean is utilized to estimate the covariance matrix in space-time adaptive processing [21][22][23]. The results have shown that the projection algorithm with Riemannian mean can yield significant performance gains. In [24,25], some geometric barycenter and medians are proposed for radar training data selection in a homogeneous environment. In [26], a geometric method is presented for covariance estimation, where each estimator is associated with a given unitary invariant norm and performs the sample covariance matrix projection into a specific set of structured covariance matrices. Recently, geometric Barycenters of symmetric positive definite matrices have been used for covariance estimators in compound-Gaussian clutter [27]. Moreover, in our previous work [28,29], we have proposed a lot of divergence means and medians of covariance matrices computed in a neighborhood of the cell under test for radar target detection. Finally, the geometric approach is used also in many other applications; for instance, the Bhattacharyya mean and median are exploited for diffusion tensor magnetic resonance (DT-MRI) image segmentation [30,31]. In these contexts, the geometric approaches have achieved good performances.
In this paper, a new class of total Bregman divergence (tBD) is presented on the Riemannian manifold. Based on the tBD, the three geometric medians, including total square loss (TSL) median, total von Neumann (TVN) median, and total LogDet (TLD) median, are derived, and are used as the estimators of disturbance matrix. As a matter of fact, the SCM of K secondary data can be seen as the arithmetic mean of K autocovariance matrices of rank one. The arithmetic mean ignores the fact that these matrices lie in a nonlinear matrix space, the Riemannian manifold, which is the foundation of information geometry. As the geometric medians are not relying on the statistical characteristics of the sample data in heterogeneous clutter, the performance of covariance estimation can be improved.
The rest of this paper is organized as follows: Section 2 formulates the problem of covariance estimation from information geometry; the total Bregman divergence-based estimators are presented in Section 3; Section 4 gives the theoretical analysis about the robustness of the proposed estimators; experimental results are presented in Section 5 ; Section 6 concludes our work.
Notation: Here are some notations for the description of this article. A matrix A and a vector x are noted as uppercase bold and lowercase bold, respectively. The conjugate transpose of matrix A is denoted as A † . tr(A) is the trace of matrix A. det(A) is the determinant of matrix A. I denotes the identity matrix. Finally, E(·) denotes statistical expectation.

Problem Formulated from Information Geometry
In this section, a heterogeneous environment is considered for covariance estimation. The secondary data r k , k = 1, 2, ..., K is a spherically invariant random vector, and can be expressed as, where τ k is a nonnegative scalar random variable, and g k is a N-dimensional circularly symmetric zero-mean vectors with an arbitrary joint statistical distribution and sharing the same covariance matrix, where † denotes the conjugate transpose. Since the knowledge of the statistical characterization of the secondary data is not known in heterogeneous clutter, the classic approaches, e.g., ML and minimum mean-square error, cannot be applied for covariance estimation of the sample data. Thus, other covariance estimators, not dependent on the probability distribution of the whole environment, are very promising. In this paper, a covariance estimation method based on information geometry is proposed.
Recall that the SCM of K secondary data r k , k = 1, 2, ..., K can be given aŝ where r k r † k is an autocovariance matrix of rank one, and is singular. It can be noted from Equation (3) thatΣ is the arithmetic mean of K autocovariance matrices. In fact, these matrices lie in the nonlinear Hermitian matrix space, as the sample data is complex. It is well known that Hermitian positive-definite (HPD) matrices form a differentiable Riemannian (and also a Finsler) manifold [32,33], that is the most studied example of a manifold with nonpositive curvature [34,35]. In order to facilitate the analysis, the singular matrix r k r † k is positive by adding an identity matrix, as R k = r k r † k + I. Then, the disturbance covariance matrix can be estimated by a median related to a divergence on the Riemannian manifold of HPD matrices.
As illustrated in Figure 1, the geometric median is performed on the Riemannian manifold of HPD matrices with a non-Euclidean metric, whereas the arithmetic mean is considered in the Euclidean space. The difference implies that the different geometric structures are considered in these two estimators. Moreover, for different metrics on the Riemannian manifold, the performance of covariance estimation may be very different. These results will be found in our other reports [36]. In the next section, a new class of tBD is proposed on the Riemannian manifold. On basis of the tBD, the tBD median is derived and used for the estimator.

Total Bregman Divergence-Based Estimators on the Manifold
In this section, a new class of tBD is proposed on the Riemannian manifold. Then, the medians associated with the tBD are derived.

The Geometry of HPD Matrices
Let H(n) = {A|A = A H } denote the space of n × n Hermitian matrix. For a Hermitian matrix A, if the quadratic form x H Ax > 0, ∀x ∈ C(n), then A is an HPD matrix, where C(n) is the space of n-dimensional complex vectors. All n × n HPD matrices consist of a positive-definite Hermitian matrix space P(n), which forms a Riemannian manifold of dimension n(n + 1)/2 with a constant non-positive curvature [35]. For a point A on the Riemannian manifold, the infinitesimal arclength between A and A + dA is given by where ds defines a metric on the Riemannian manifold [37]. . F is the Frobenius norm of a matrix. The inner product and corresponding norm on the tangent space at the point A can be defined as [38] For two points P 1 and P 2 on the Riemannian manifold, the affine invariant (Riemannian) distance is given by [39] where logm is the logarithmic map on the Riemannian manifold of HPD matrices.

Total Bregman Divergence
The tBD has been proposed on the space of convex functions by Baba C. Vemuri, and has been used for DT-MRI analysis [40], shape retrieval [41,42], and object tracking [43]. For x, y ∈ R , f is a differentiable, strictly convex function, the total Bregman divergence δ between x, y is defined as [40] where f (y) is the derivative of f (x) at y. As illustrated in Figure 2, the tBD between x and y is defined as the orthogonal distance between the value of a convex and differentiable function f at the first argument of y and its tangent at the second argument. It can be also noted from Figure 2 that when the coordinate system rotates at an angle, the tBD between x and y does not change. This implies that the tBD is invariant to linear transformation; in other words, it is statistically robust. In the following, we extend the definition of tBD to the Riemannian manifold of HPD matrices.
Definition 1. Let f be a strictly convex and differentiable function, for two HPD matrices X, Y, the tBD is defined as In the following, we give the definitions of the three new divergences when the function f has various forms.
Let f (x) = x 2 , then tr( f (X)) = X 2 , f (Y) = 2Y, and Equation (9) denotes the TSL If and Equation (9) yields the divergence, which is called the TVN, When and we obtain the divergence referred to as the TLD or the total Stein loss, Based on the three divergences on the Riemannian manifold, the medians for a finite set of HPD matrices are derived in the next section.

Total Bregman Divergence Median for HPD Matrices
For a set of HPD matrices, the median related to a measure is defined as the minimizer of the sum of the distance to the given matrices.
Definition 2. Let f be a differentiable convex function; then, the median related to the tBD Equation (9), of m HPD matrices {R 1 , R 2 , ..., R m } is defined aŝ It is worth pointing out thatR will be the arithmetic mean, if the Frobenius norm, instead of the tBD, is utilized.
Setting the gradient ∇F (R) equal to zero, Solving Equation (15) yields The tBD δ f (R, R i ) is a convex function, and F (R), the sum of m convex functions, is also a convex function. Hence, the medianR indeed exists. Moreover, since f is a convex function, and f is monotonic, the medianR is unique.
In the following, three concrete cases of the tBD are derived when f is given as different forms.

Proposition 2.
The median related to the TSL (10), of m HPD matrices {R 1 , R 2 , ..., R m } is given bŷ Proposition 3. The median related to the TVN divergence (11) of m HPD matrices {R 1 , R 2 , ..., R m } is given bŷ Proposition 4. The median related to the TLD divergence (12), of m HPD matrices {R 1 , R 2 , ..., R m } is given bŷ Proof of Proposition 2, 3, and 4. With the help of the formulas ∂ R 2 /∂ R = 2R, ∂ (R log R − R)/∂ R = log R, and ∂ (− logdet (R))/∂ R = −R −1 , we then have These three equations are for the three propositions above in the same order. Equating these gradients to zero and solving for R yield the medians.

Robustness Analysis of Total Bregman Divergence Median
This section is devoted to analyzing the robustness of total Bregman divergence median and arithmetic mean via the influence function. LetR be the median, associated with total Bregman divergence, of m HPD matrices {R 1 , ..., R m }.R is the median by adding a set of n outliers {Q 1 , ..., Q n } with a weight ε(ε 1) to {R 1 , ..., R m }. Then, we can defineR =R + εH(Q), and H(Q) denotes the influence function. In the following, four propositions are presented.

Proposition 5.
The influence function of arithmetic mean related to the Frobenius norm, of m HPD matrices {R 1 , ..., R m } and n outliers {Q 1 , ..., Q n } is given by Proof of Proposition 5. Let F(R) be the objection function The derivative of objection function F(R) is Note thatR is the median of m matrices and n outliers, andR is the median of m matrices; then, we haveR SubstituteR =R + εH(Q) into Equation (26), and we have Proposition 7. The influence function of TVN median related to TVN, of m HPD matrices {R 1 , ..., R m } and n outliers {Q 1 , ..., Q n } is given by Proposition 8. The influence function of TLD median related to TLD, of m HPD matrices {R 1 , ..., R m } and n outliers {Q 1 , ..., Q n } is given by Proof of Proposition 6, 7, and 8. According to Equation (9), let F(R) be the objection function Note thatR is the median of m matrices and n outliers; then, we have, Using Taylor expansions onR =R + εH(Q) on the derivative function f , we can obtain Substituting Equation (34) into Equation (33), AsR is the median of m matrices, we have Substituting Equation (36) into Equation (35), and ignoring the term containing ε 2 , Finally, the influence function of total Bregman divergence median can be derived Let f (R) = R 2 ; then, f (R) = 2R and f (R) = 2. Substitute these into Equation (38), and we can obtain Proposition 6.
In addition, we show that the influence value of total Bregman divergence median has an upper bound when the number of outlier n = 1. Let Then, Equation (38) can be rewritten as Note that w (Q) = 1 + f (Q) 2 F ≥ f (Q) F and w (Q) ≥ 1; then, we have the following inequality: where c is a constant. From the inequality (41), we can know that the influence function H(Q) has an upper bound when Q varies. It implies that our proposed tBD medians are robust.

Numerical Simulations
In this section, the performances of adaptive normalized matched filter with the proposed estimators and the normalized sample covariance matrix (NSCM) are evaluated by means of the standard Monte Carlo techniques, as the analytical expression for the probability of detection is not available. Particularly, consider a classical target detection problem, which is formulated as follows: where z and r denote the received signal and the clutter data in the cell under test, respectively. β is an unknown complex quantity, accounting for the target radar cross section and the channel propagation effects. s is the target steering vector, where f d is the normalized Doppler frequency. The terms r and r k , k = 1, . . . , K are the spherically invariant random vectors, and can be formulated as, where g and g k are N-dimensional circularly symmetric zero-mean Gaussian vectors, and share the same covariance matrix Σ. τ and τ k are positive and real random variables, which are also independent and identical distributed. In particular, the terms τ and τ k are assumed to follow the inverse gamma distribution, where α and β denote the shape and scale parameters, respectively. Γ (·) is the gamma function. For the target detection problem (42), the adaptive normalized matched filtering (ANMF) is often used, which can be given by whereΣ is an estimator that is based on the secondary data r k , k = 1 . . . , K. γ denotes the threshold, which is derived by the Monte Carlo method in order to maintain the false alarm constant. In addition, the covariance matrix of g and g k are given as the sum of two parts, where I is accounting for the thermal noise. Σ 0 is related to the clutter, modeled as, where ρ is the one-lag correlation coefficient. σ c is the clutter-to-noise power ratio. f dc is the clutter normalized Doppler frequency. In this simulation, we set ρ = 0.9, f dc = 0.1, and σ 2 c = 25 dB. The parameters α = 3, and β = 1.
Performances of ANMFs with our proposed estimators are compared with the Frechet median estimator [18] and the NSCM estimator. The plots of P d versus signal-to-clutter ratio (SCR) for different size of K, the number of secondary data, are shown in Figures 3 and 4. The P d is estimated by the relative frequencies of 1000 simulations. The detection threshold is obtained using 100/P f a Monte Carlo trials, for a fixed nominal P f a . It can be noted from Figures 3 and 4 that the detection performances of ANMFs with our proposed estimators and the Frechet median estimator outperform that of ANMF with the NSCM estimator. The ANMF with the TLD estimator has the best performance, followed by the TVN estimator. The Frechet median estimator has similar performance to the TVN estimator. However, the difference is not very clear. This implies that our proposed estimators are robust for the size of K, unlike the NSCM estimator. In order to analyze the robustness of total Bregman divergence median and arithmetic mean estimators, we inject a outlier with normalized Doppler frequency f d = 0.1 in 30 sample data, and compute the influence value according to (28), (29), (30), and (31). The results are shown in Figure 5. From Figure 5, we can know that the TLD median estimator has the smallest influence value, followed by the TSL median estimator. However, all of our proposed three estimators have smaller influence value than the arithmetic mean estimator, which is the NSCM estimator. These results imply that our proposed estimators are more robust than the NSCM estimator.  To enhance the persuasiveness of the results, we inject a outlier with the f dc = 0.15, and SCR = 20 dB. Then, the plots of P d versus SCR of ANMF with different estimators are given in Figures 6 and 7. It is clear from Figures 6 and 7 that the proposed TSL, TLD, and TVN estimators outperform the NSCM in a contaminated clutter. Moreover, the performance of the Frechet median estimator is close to our proposed estimators.

Conclusions
In this work, we have proposed a covariance estimation method based on information geometry in a heterogeneous clutter. The problem of covariance estimation has been reformulated as the geometric median related to a measure. In particular, the three tBDs, including the TSL, the TLD, and the TVN, have been proposed on the Riemannian manifold. Then, we have derived the TSL, the TLD, and the TVN median estimators. At the analysis stage, the results of numerical experiments have highlighted that our proposed estimators outperform the NSCM estimator, and have similar performance to the Frechet median estimator in heterogeneous clutter.