Robust Covariance Estimators Based on Information Divergences and Riemannian Manifold

This paper proposes a class of covariance estimators based on information divergences in heterogeneous environments. In particular, the problem of covariance estimation is reformulated on the Riemannian manifold of Hermitian positive-definite (HPD) matrices. The means associated with information divergences are derived and used as the estimators. Without resorting to the complete knowledge of the probability distribution of the sample data, the geometry of the Riemannian manifold of HPD matrices is considered in mean estimators. Moreover, the robustness of mean estimators is analyzed using the influence function. Simulation results indicate the robustness and superiority of an adaptive normalized matched filter with our proposed estimators compared with the existing alternatives.


Introduction
Covariance estimation plays an important role in adaptive signal processing, such as multichannel signal processing [1,2], space-time adaptive processing (STAP) [3,4], and radar target detection. Conventional covariance estimation methods, derived from the maximum-likelihood (ML) of the clutter data, are based on the assumption that the clutter remains stationary and homogeneous during the adaptation process. However, in real heterogeneous clutter, the number of the sample data where the clutter is homogeneous is very limited, and the estimation performance is seriously degraded. Therefore, it is necessary and important to improve the performance of estimated covariance in heterogeneous clutter.
A commonly used strategy to ameliorate the performance of estimated covariance is to exploit some a priori information about the clutter environment. For instance, the geographic information is used for covariance estimation, and the performance of target detection with the estimator is significant improved [5]. In [6], the authors employ the Bayesian method to perform the target detection in interference environment, where the unknown covariance matrix is assumed to follow a suitable probability model. In [7,8], a Bayesian framework is used together with the structural information about the estimated covariance. In [9], a condition number upper-bound constraint is imposed on the problem of covariance estimation to achieve a signal-to-noise ratio improvement. Moreover, a symmetrically structured spectral density is constrained on the covariance estimation, and these results show the superiority of the estimator [10]. These mentioned methods rely on the knowledge of statistics characterization of the clutter data. However, the probability distribution of the whole environment is difficult to obtain, and a mismatched distribution results in a remarkable degradation of the estimation performance in heterogeneous clutter.
Many covariance estimation algorithms derived from the geometry of matrix space, not resorting to the statistical characterization of the sample data, are reported in the literature. For instance, the Riemannian mean is used for monitoring the wake turbulence [11,12] and target detection in HF and X-band radar [13]. In [14][15][16], Riemannian mean and median are employed for covariance estimation in STAP, and results have shown that the projection algorithm with Riemannian mean can yield significant performance gains. In [17,18], some geometric barycenter and medians are proposed for radar training data selection in homogeneous environment. In recent times, we have explored information divergence means and medians for target detection in non-Gaussian clutter [19][20][21]. Moreover, in image processing applications, Bhattacharyya mean and median are used for filtering and clustering of diffusion tensor magnetic resonance image [22,23]. In [24], the Log-Euclidean mean, together with the reproducing kernel Hilbert mapping, is used for texture recognition. These geometric approaches have achieved good performances.
In this paper, a class of covariance estimators based on information divergences is proposed in heterogeneous clutter. In particular, six means related to geometric measures are derived on the Riemannian manifold of Hermitian positive-definite (HPD) matrices. These means do not rely on the knowledge of statistics characterization of sample data, and the geometry of the matrix space is considered. Moreover, the robustness of means is analyzed with injected outliers via the influence function. Simulation results are given to validate the superiority of proposed estimators.
The rest of this paper is organized as follows. In Section 2, we reformulate the problem of covariance estimation on the Riemannian manifold. In Section 3, the geometry of Riemannian manifold of HPD matrices is presented, in particular, six distance measures are given on the Riemannian manifold, and means associated with these measures are derived. The robustness of means is analyzed via the influence function in Section 4. Then, we evaluate performances of an adaptive normalized matched filter with geometric means as well as the normalized sample covariance matrix in Section 5. Finally, conclusion is provided in Section 6.

Notation
Here is some notation for the descriptions of this article. A matrix A and a vector x are noted as uppercase bold and lowercase bold, respectively. The conjugate transpose of matrix A is denoted as A H . tr(A) is the trace of matrix A. |A| is the determinant of matrix A. I denotes the identity matrix. Finally, E(.) denotes the statistical expectation.

Problem Reformulated on the Riemannian Manifold
A heterogeneous environment is considered for covariance estimation. For a set of K secondary data {r 1 , . . . , r K }, the normalized sample covariance matrix (NSCM)R based on the ML of probability distribution of the sample data is estimated as, where N is the dimension of the vector r k . r k , k = 1, . . . , K are modeled as a compound-Gaussian random vector, and can be expressed as, where τ k is a nonnegative scalar random variable, and g k is a N-dimensional circularly symmetric zero-mean vectors with an arbitrary joint statistical distribution and sharing the same covariance matrix, It is clear from Equation (1) that the NSCM is the arithmetic mean of K auto-covariance matrices R k of rank one. Since the knowledge of probability distribution of the whole environment is difficult to obtain in heterogeneous clutter, the performance of NSCM is severely degraded. Actually, these K auto-covariance matrices lie in a non-linear Hermitian matrix space, as the sample data is complex. It is well known that HPD matrices form a differentiable Riemannian manifold [25], that is the most studied example of a manifold with non-positive curvature [26]. In order to facilitate the analysis, the matrix R k is transformed to the positive-definite matrix P k using the following three ways: (1) P k is obtained by adding an identity matrix I, as P k = R k + I; (2) A Toeplitz HPD matrix T k is utilized. As in [13], T k can be expressed as, where c k denotes the correlation coefficient of sample data, andc i is the complex conjugate of c i . c k can be computed as, (3) The HPD matrix S k is the solution of the optimization problem as follows [17], The optimal solution can be given by [17], where U k is a unitary matrix of the eigenvectors of r k r H k with the first eigenvector corresponding to the eigenvalue r k 2 . The κ M is the condition number.
According to above transformations, we can obtain the HPD matrix. Then, the problem of covariance estimation can be reformulated on the Riemannian manifold. In general, for a set of m scalar numbers x 1 , . . . , x m , the arithmetic mean is defined as the minimum of sum of squared distances to the given point From Equation (8), we can understand the arithmetic mean from a geometric viewpoint. Similar to Equation (8), for K HPD matrices P 1 , . . . , P K , the mean related to a measure can be defined as, where d(., .) denotes the measure. It is worth pointing out that the arithmetic mean Equation (1) is obtained, when d is the Frobenius norm and P k is replaced by R k . The difference between the arithmetic mean and the geometric mean is shown in Figure 1. As illustrated in Figure 1, the geometric median is performed on the Riemannian manifold of HPD matrices with a non-Euclidean metric, whereas the arithmetic mean is considered in the Euclidean space. The difference implies that the different geometric structures are considered in these two estimators.

The Geometry of Riemannian Manifold of HPD Matrices
In this Section, the fundamental mathematic knowledge related to this paper is presented. Firstly, the Riemannian manifold of HPD matrices is introduced. Then, six distance measures are presented. Finally, the means related to measures are derived.

The Riemannian Manifold of HPD Matrices
Let H(n) = {A|A = A H } denotes the space of n × n Hermitian matrix. For a Hermitian matrix A, if the quadratic form x H Ax > 0, ∀x ∈ C(n), then, A is an HPD matrix, where C(n) is the space of n-dimensional complex vector. All n × n HPD matrices consist of a positive-definite Hermitian matrix space P(n), forms a Riemannian manifold of dimension n(n + 1)/2 with a constant non-positive curvature [26]. For a point A on the Riemannian manifold, the infinitesimal arclength between A and A + dA is given by [27], where ds defines a metric on the Riemannian manifold [27]. . F is the Frobenius norm of a matrix. The inner product and corresponding norm on the tangent space at the point A can be defined as [28], For two points P 1 and P 2 on the Riemannian manifold, the affine invariant (Riemannian) distance is given by [29], where logm is the logarithmic map on the Riemannian manifold of HPD matrices.

The Geometric Measure on the Riemannian Manifold
In addition to the Riemannian distance, a lot of distance or information divergences can be used as the measurement on the Riemannian manifold. Here, five geometric measures are presented in the following.
(1) Log-Euclidean distance The Log-Euclidean distance is also a geodesic distance. It is defined on the tangent space at a point on the Riemannian manifold, which is isomorphic and diffeomorphic to the tangent space identified with a Hermitian matrix space. For two points P 1 and P 2 , the Log-Euclidean distance d LE (P 1 , P 2 ) is given by [30], (2) Hellinger distance The Hellinger distance is a special case of the α-divergence with α = 0. Given two points P 1 and P 2 , the Hellinger distance d H (P 1 , P 2 ) is [31], (

3) Kullback-Leibler divergence
It is well known that the Kullback-Leibler (KL) divergence is the most widely used measure on the Riemannian manifold. The KL divergence is also a special case of the α-divergence with α = ±1. In addition, the KL divergence is called the Stein loss or the log-determinant divergence. The KL divergence d KL (P 1 , P 2 ) between two points P 1 and P 2 can be given by [32], The Bhattacharyya distance is a common used measure, and has been used in medical image segmentation [22]. In particular, the Bhattacharyya distance is a Jensen version of the KL divergence. For two points P 1 and P 2 , the Bhattacharyya distance d B (P 1 , P 2 ) can be given by, The symmetrized Kullback-Leibler (SKL) divergence is a Jeffreys divergence [33]. It behaves as the square of a distance; however, it is not a distance, as the triangle inequality does not hold. Given two points P 1 and P 2 , the SKL d SKL (P 1 , P 2 ) between them is,

The Geometric Mean for A Set of HPD Matrices
In this Section, the means related to the above mentioned six measures are derived using the fix-point algorithm. This work has been done in our previous article [19]. In the following, six means are presented in Table 1.

Geometric Measure Mean
Where t is the number of iteration, and ε t is is the step size of iteration.

Robustness Analysis of Geometric Means
This section is devoted to analyzing the robustness of geometric means via the influence function. LetP be the mean, associated with a measure, of m HPD matrices {P 1 , . . . , P m }.P is the mean by adding a set of n outliers {Q . . . ..., Q n } with a weight ε(ε 1) to {. . . 1 , ..., P m }. Then, we can defineP =P + εH(Q), H(Q) denotes the influence function. In the following, seven propositions are presented.
Proof of Proposition 1. Let F(P) be the objection function, The derivative of objection function F(P) is, Note thatP is the mean of m matrices and n outliers, andP is the mean of m matrices, then, we have,P = arg min P F(P) SubstituteP =P + εH(Q) into Equation (22), and we have Proof of Proposition 2. Let F(P) be the objection function, AsP is the mean of m matrices and n outliers, andP is the mean of m matrices, then, we have, Using the Taylor expansion onP =P + εH(Q), and we have Substitute Equations (28) and (29) into Equation (27), and we have Ignore the terms contain ε 2 for the constant ε 1, and we can obtain, Proof of Proposition 3. Let F(P) be the objection function, Note thatP is the mean of m matrices and n outliers, andP is the mean of m matrices, then, we have,P = arg min P F(P) Using the Taylor expansion onP =P + εH(Q), and we have log(P) = log(P) + εH(Q)P −1 Substitute Equations (35) and (36) into Equation (34), and ignore the terms contain ε 2 , Proof of Proposition 4. Let F(P) be the objection function, Note thatP is the mean of m matrices and n outliers, andP is the mean of m matrices, then, we have, Using the Taylor expansion onP =P + εH(Q), and we have Substitute Equations (41) and (42) into Equation (40), and ignore the terms contain ε 2 , AsP is the mean of m matrices and n outliers, andP is the mean of m matrices, then, we have, Using the Taylor expansion onP =P + εH(Q), and we have, Substitute Equations (47) and (48) into Equation (46), and ignore the terms contain ε 2 , Proof of Proposition 6. Let F(P) be the objection function, Note thatP is the mean of m matrices and n outliers, andP is the mean of m matrices, then, we have, Using the Taylor expansion onP =P + εH(Q), and we have, Substitute Equations (53) and (54) into Equation (52), and ignore the terms contain ε 2 , Proposition 7. The influence function of SKL mean related to the SKL divergence, of m HPD matrices {P 1 , . . . , P m } and n outliers {Q . . . ..., Q n } is given by, Proof of Proposition 7. Let F(P) be the objection function, Note thatP is the mean of m matrices and n outliers, andP is the mean of m matrices, then, we have,P = arg min P F(P), ∇F(P) = 0 Using the Taylor expansion onP =P + εH(Q), and we have, Substitute Equations (59) and (60) into Equation (58), and ignore the terms contain ε 2 ,

Numerical Simulations
In order to gain a better understanding of the superiority of proposed estimators, simulation results of the performance of an ANMF with the proposed estimator in heterogeneous clutter are presented. As there is not an analytical expression for the detection threshold, the standard Monte Carlo technique [34] is utilized. A similar approach was recently used to solve several problems from different areas, such as physics [35], decision theory [36], engineering [37], computational geometry [38], finance [39], etc. The rule of adaptive normalized matched filter (ANMF) is given as [40], whereΣ is the clutter covariance estimation. r is the sample data in the cell under test. γ denotes the threshold, which is derived by Monte Carlo method in order to maintain the false alarm constant. s is the target steering vector, and is given by, where f d is the normalized Doppler frequency. According to Equation (2), the terms r k , k = 1, . . . , K are compound-Gaussian random vectors, and sharing the same covariance matrix Σ, where I is accounting for the thermal noise. Σ 0 is related to the clutter, modeled as, where ρ is the one-lag correlation coefficient. σ c is the clutter-to-noise power ratio. f dc is the clutter normalized Doppler frequency. In addition, τ and τ k are positive and real independent and identical distributed random variables, and are assumed to follow the inverse gamma distribution, where α and β denote the shape and scale parameters, respectively. Γ (·) is the gamma function. In the simulation, we set ρ = 0.9, f dc = 0.1, and σ 2 c = 25 dB. The parameters α = 3, and β = 1. In the following, we analyze the performance of an ANMF with the proposed estimators, in terms of detection probability (P d ), also in comparison with the optimum detector, which assumes the perfect knowledge of the disturbance covariance matrix, NMF. In particular, the positive-definite matrix obtained using (1), (2), and (3) is noted as the SPDF, THPD, and SPD matrix, respectively. Simulation results are shown in Figure 2 with N = 8, P f a = 10 −4 , it is clear that the proposed estimators have different performances, and all the proposed estimators have better performances than the NSCM estimator when K = 10 and K = 16. In particular, our proposed estimators have different performance when different positive-definite matrix is utilized. For the SPDF positive-definite matrix, the Bhat estimator, the Hel estimator, and the KL estimator have comparable performances, and outperform others. The SKL estimator has the worst performance, while the KL estimator has the best performance. However, this relationship is different on the condition of the THPD positive-definite matrix. Particularly, performances of the KL estimator and the SKL estimator are poor. Performances of the other proposed estimators are close to the optimum. For the SPD positive-definite matrix, relationships of proposed estimators are similar to the case of SPDF positive-definite matrix. The KL estimator has the best performance, while the performance of SKL estimator is the worst. In addition, the performance of Hel estimator is poor. These results imply that performances of proposed estimators are related to the used positive-definite matrix. In order to show the influence function of robustness of proposed estimators, 17 positive-definite matrices with an injected outlier are considered. The value of influence function is computed as Propositions 1-7, and 100 times simulations are repeated. A total of 100 simulation results and the average of values of influence function are shown in Figure 3. From Figure 3 we can know that our proposed estimators are more robust than the NSCM estimator. In particular, the robustness of SKL estimator is poor, while the KL estimator has the best robustness when the SPDF or THPD positive-definite matrix is utilized. For the SPD positive-definite matrix, all proposed estimators have comparable robustness. It can be concluded that the robustness of proposed estimators is related to the used positive-definite matrix. It is worth pointing out that the three HPD matrices, namely the SPDF, THPD, and SPD matrix, have different structures. Both the SPDF and the SPD matrices have a largest eigenvalue and N − 1 equal eigenvalues. Their differences lie in the multiple between the maximum eigenvalue and the minimum eigenvalue. In particular, this multiple of the SPD matrix is the constant number κ M , while the SPDF matrix has a varied multiple. For the THPD matrix, all eigenvalues are different. Riemannian manifolds composed of different positive-definite matrices have different geometric structures. Thus, the estimators associated with different metrics on Riemannian manifold may have different behaviors. Figure 4 plots the P d of the ANMF with our proposed estimators, the NSCM estimator, and the NMF detector in a contaminated clutter. An outlier is injected in one reference cell, the number of reference cell K is set to 10 and 16, respectively. The dimension of the sample data is 8. It can be noted from Figure 4 that performances of our proposed estimators have not been significantly reduced, while there is a degradation in the performance of the NSCM estimator. Relationships of performances are similar to curves of Figure 2. These results prove the advantage of our proposed estimators sufficiently.

Conclusions
In this paper, a class of covariance estimators based on information divergences is proposed in heterogeneous clutter. Particularly, the problem of disturbance covariance estimation is reformulated as obtaining the geometric mean on the Riemannian manifold. Six mean estimators related to information measures are derived. Moreover, the robustness of proposed estimators are analyzed via the influence function, and the analytic expression of influence function is deduced. At the analysis stage, the performance advantage and robustness of our proposed estimators are verified by means of simulation results in heterogeneous environment.