Heterogeneous Clutter Suppression for Airborne Radar STAP Based on Matrix Manifolds

: Clutter suppression in heterogeneous environments is a serious challenge for airborne radar. To address this problem, a matrix-manifold-based clutter suppression method is proposed. First, the distributions of training data in heterogeneous environments are analyzed, while the received data are characterized on a Riemannian manifold of Hermitian positive deﬁnite matrices. It is indicated that the training data with different distributions with the same power are separated, whereas data with the same distribution are closer together. This implies that the underlying geometry of the data can be better revealed by manifolds than by Euclidean space. Based on these properties, homogeneous training data are selected by establishing a binary hypothesis test such that the negative effects of the use of heterogeneous samples are alleviated. Moreover, as exploiting a geometric metric on manifolds to reveal the underlying information of data, experimental results on both simulated and real data validate that the proposed method has a superior performance with small sample support. of outliers (discrete clutter) are investigated to evaluate the potential of the proposed method. The experiments show that the proposed method is not only competitive in the case of limited samples but also has a good robustness to discrete clutter. In possible future research, it is worth studying the geometric metric optimization on manifolds. Additional research direction may concern the fast algorithm for geometric barycenter to estimate covariance matrix.


Introduction
Airborne radar plays an important role in detecting aircraft, ships, and vehicles at long ranges [1]. However, due to the platform motion, slowly moving targets are easily masked by strong clutter; hence, clutter suppression is a challenging problem for airborne radar to overcome [2,3]. Traditional typical clutter suppression methods include displaced phase center antenna (DPCA) [4] and along-track interferometry (ATI) [5]. The DPCA technique requires a very stringent constraint on the baseline between channels, limiting the clutter suppression performance. Then, to enable superior clutter suppression performance, spacetime adaptive processing (STAP) was developed [6][7][8]. It is well known that STAP involves the design of a space-time adaptive filter, which requires the accurate estimation of the clutter covariance matrix (CCM) of the cell under test (CUT) [9]. Generally, training data that share the same statistical characteristics with the data of CUT are used to estimate the CCM. The conventional sample matrix inverse (SMI) approach performs well when the number of training samples is greater than two times the dimensionality of STAP [10]. Unfortunately, this is restricted because the large independent and identical distribution (i.i.d) sample requirement will not usually hold in heterogeneous environments [11]. Worse still, training data are usually contaminated by outliers (i.e., discrete clutter), resulting in remarkable clutter suppression performance degradation.
To address this problem, sufficient research has been performed. Trimming algorithms were designed to eliminate the contaminated training samples. Many algorithms concentrate on an effective non-homogeneity detector (NHD) [12,13] in which the generalized internal product (GIP) and the adaptive power residue (APR) are two important criteria. Since certain NHDs depend on the training data used, to mitigate the finite sample effect, improved methods were proposed, e.g., a new type of GIP detector based on diagonal loading (LGIP) was reported in [14] and a cyclic training sample selection and cancellation Notations: Throughout this paper, scalar quantities are denoted with the italic typeface. Lowercase italic boldface quantities denote vectors and uppercase italic boldface quantities denote matrices. (·) denotes the conjugate transpose operation. · denotes the norm and A 0 represents that A is a positive definite matrix. diag(·) denotes the diagonal operation, I denotes the identity matrix, trace(·) denotes the trace of a matrix, E[·] denotes the statistical expectation, and ⊗ denotes the kronecker product.

Array Signal Model and Problem Formulation
Consider a side-looking airborne radar system with N antenna elements in a uniform linear array, as shown in Figure 1. The platform travels with a velocity ν, and the scatter point on the ground has an azimuth angle θ, and an elevation angle ϕ. Each sensor in the array transmits M pulses during a coherent processing interval. The radar operating wavelength is λ.
uate the dissimilarity metrics, combined with practical needs. The superiority of the proposed method is demonstrated by comparing the competing approaches in the experiment. Our results show that the most homogeneous training data can be extracted effectively, and the requirement of the samples can be decreased.
Notations: Throughout this paper, scalar quantities are denoted with the italic typeface. Lowercase  denotes the statistical expectation, and Ä denotes the kronecker product.

Array Signal Model and Problem Formulation
Consider a side-looking airborne radar system with N antenna elements in a uniform linear array, as shown in Figure 1. The platform travels with a velocity n , and the scatter point on the ground has an azimuth angle θ , and an elevation angle ϕ . Each sensor in the array transmits M pulses during a coherent processing interval. The radar operating wavelength is λ .  In an airborne STAP radar system, the space-time snapshot k z received by sensors in the th k range bin encompasses different components: the target signal tk x , the clutter patch echo ck x , the interference signal dk x (only discrete clutter is considered here, for notational convenience, it is dubbed as an outlier) and the noise n , and ( ) In an airborne STAP radar system, the space-time snapshot z k received by sensors in the kth range bin encompasses different components: the target signal x tk , the clutter patch echo x ck , the interference signal x dk (only discrete clutter is considered here, for notational convenience, it is dubbed as an outlier) and the noise n, where N c and N d indicate the number of clutter patches and outliers, γ ci and γ di represent the complex amplitude of the ith clutter patch and outliers, γ is the scatter reflection coefficient of the target signal, and s represents the steering vector of the target signal and representing the Doppler and spatial steering vectors of the target, respectively, where f t = 2vT/λ cos θ cos ϕ, T is the pulse repetition interval, f s = d/λ cos θ cos ϕ, and d is the inter-element spacing between antenna elements. Correspondingly, s ci and s di are the steering vectors of the ith clutter patch and outliers, respectively. We denote the training data in the kth range cell without targets as x k . Then, to form the space-time adaptive filter, the CCM estimated by SMI, which has been widely used, takes the formR where K is the number of training data andR k is the CCM in the kth range bin. After applying the space-time filter, the improvement factor (IF) [9] is usually used to assess the clutter suppression performance, where SCNR o and SCNR i are the output signal-clutter plus noise to power ratio (SCNR) and input SCNR, respectively. The noise power is σ 2 n , CNR represents the clutter to noise ratio, and R is the CCM in the CUT. The adaptive weights w for clutter suppression can be calculated by Note that conventional clutter suppression methods based on SMI achieve satisfactory performance when there are sufficient homogeneous samples. However, in heterogeneous environments, due to the shortage of homogeneous samples, inaccurate CCM estimation leads to remarkable clutter suppression performance loss. Hence, in view of this, improved clutter suppression methods in heterogeneous environments should be investigated.

Training Data Selection Based on Manifolds in Heterogeneous Environments
In this section, a Riemannian manifold of HPD matrices is established first. Then, the properties of the distribution of training data in heterogeneous environments are discussed. Based on these properties, training data are selected using a binary hypothesis test.

The Establishment of a Matrix Manifold
Notice that the CCMR estimated in (5) can also be viewed as the arithmetic mean of the sample matricesR k in Euclidean space, which is generalized by the following optimization problem [33] F represents the distance in Euclidean space, and R(m) =span{R i } m i is the m-dimensional linear space spanned by the observed matrices. The weight k (with k > 0 and ∑ K k=1 k = 1) allows the weighting of the training data in each range bin. Moreover, the distance that is commonly used in (8) can also be reformulated as d 2 Unfortunately, it only focuses on the signal power and does not make use of the data structure. To overcome this drawback, we try to focus on the properties of data. Herein, the covariance matrix that can implicitly capture the second-order statistical characteristics of the received data in each range bin is employed. Specifically, the CCM R k in each range bin is devised as the following: where the coefficient r l can be obtained by the average over time instead of statistical expectation, and n represents the slow time. The CCM (not only R k , but alsoR k ) has a positive semi-definiteness property, thus the space constituted by the covariance matrices is not closed under addition and scalar products (e.g., multiplying a positive definite matrix with a negative scalar makes it a negative definite) [34]. Hence, the space of covariance matrices is not linear. Furthermore, the CCM estimation through Euclidean space (a linear space) in (8) is not optimal. To demonstrate this further, we generate a set of 2 × 2 positive definite matrices according to the Wishart distribution and present the space constituted by these covariance matrices in Figure 2.
where the coefficient l r can be obtained by the average over time instead of statistical expectation, and n represents the slow time. The CCM (not only k R  , but also ˆk R ) has a positive semi-definiteness property, thus the space constituted by the covariance matrices is not closed under addition and scalar products (e.g., multiplying a positive definite matrix with a negative scalar makes it a negative definite) [34]. Hence, the space of covariance matrices is not linear. Furthermore, the CCM estimation through Euclidean space (a linear space) in (8) is not optimal. To demonstrate this further, we generate a set of 2´2 positive definite matrices according to the Wishart distribution and present the space constituted by these covariance matrices in Figure 2. Herein, ij x represent the coordinate, and the red circles represent covariance matrices. As observed, the Euclidean distance (the blue dotted line) between two points (point A and B) of the CCMs is neither informative nor accurate. This confirms the subpar performance of the conventional estimation methods in Euclidean space. Fortunately, a manifold ( ) m  is a space that is diffeomorphic to a Euclidean space. Intuitively speaking, a manifold is a nonlinear space, and it has a locally linear structure. As seen in Figure 2, the space of the covariance matrices lacks a global linear structure but has a local linear structure. Moreover, geometric methods on manifolds are useful methods for handling the problem in nonlinear space. Therefore, it is logical to consider our problem on manifolds. Based on this idea, we established a Riemannian manifold of HPD matrices (also referred to as the "matrix manifold") to handle the clutter suppression problem in heterogeneous environments. First, the space of Hermitian matrices constituted by the set of matrices ( , ) m   is expressed as Herein, x ij represent the coordinate, and the red circles represent covariance matrices. As observed, the Euclidean distance (the blue dotted line) between two points (point A and B) of the CCMs is neither informative nor accurate. This confirms the subpar performance of the conventional estimation methods in Euclidean space.
Fortunately, a manifold P (m) is a space that is diffeomorphic to a Euclidean space. Intuitively speaking, a manifold is a nonlinear space, and it has a locally linear structure. As seen in Figure 2, the space of the covariance matrices lacks a global linear structure but has a methods for handling the problem in nonlinear space. Therefore, it is logical to consider our problem on manifolds. Based on this idea, we established a Riemannian manifold of HPD matrices (also referred to as the "matrix manifold") to handle the clutter suppression problem in heterogeneous environments. First, the space of Hermitian matrices constituted by the set of matrices M(m, C) is expressed as where m is the dimension and the space of all the HPD matrices P (m) can be given by where R 0 represents the positive definite matrix. The space P (m) is a manifold endowed with a Riemannian distance. The so-called distance on manifold P (m) between R 1 and R 2 is the infimum of lengths of curves connecting them where L(γ)| is the length of γ, which is a sufficiently smooth curve in P (m).

The Properties of the Distribution of Training Data in Heterogeneous Environments
Based on the aforementioned section, the covariance matrices of training data can be described as different points on manifolds. Then, to reflect the underlying geometry of the data better, the distributions of these points in heterogeneous environments are analyzed in this subsection. Herein, to analyze the characteristics of training data, it is necessary to map high-dimensional data into two dimensions, while maintaining the local structure of the data. Therefore, the manifold learning algorithm-tSNE [35] is exploited to discuss the distribution of training data.
Both the typical cases in heterogeneous environments that will affect the clutter suppression performance of STAP are discussed. The first case is that the training data obey different distributions with CUT, and for the other, the training data are corrupted by outliers. For ease of discussion, a standard STAP system that employs a uniform linear array with half-wavelength inner spacing is considered. By applying the t-SNE algorithm, the distributions of covariance matrices in heterogeneous environments are provided in Figure 3. In the scene, there are 10 array elements, 8 pulses in a coherent pulse interval (CPI), and 50 range bins. We assume that the data of different range bins obey different distributions but have the same power in Figure 3a. For the sake of brevity, we label the data in the range bins which obey K, normal, lognormal, Weibull, and Rayleigh distribution as 1 to 5, respectively. Moreover, the covariance matrix in each range bin is constructed by (9). Therefore, different points in Figure 3a correspond to the labelled covariance matrices of different range bins. Similarly, the case of training samples contaminated by outliers in the training dataset is considered in Figure 3b. In this case, the clutter data of 50 range bins all obey a normal distribution, but there are five outliers among them. Specifically, the data in the range bins from 1th to 45th are homogeneous training data, and their power is 5 dB. However, the clutter power in the range bin from 46th to 47th and that in the range bins from 48th to 50th are 10 dB and 15 dB, respectively. Correspondingly, the covariance matrices of different range bins are also obtained according to (9). Similarly, the range bins of homogeneous training data (range bins from 1th to 45th) are labeled as 1 and the range bins of training samples contaminated by different outliers are labeled 2 (range bins from 46th to 47th) to 3 (range bins from 48th to 50th).

The Screening of Training Data
We assume that the statistical characteristics of data in a presumed range cell (maybe the cell adjacent to CUT and several guard cells are set) are the same as the statistical characteristics of CUT, and denote the CCM of the presumed cell as 0 R . Thus, the training data sharing the same distribution can be selected by a binary classification problem. We denote the homogeneous data set as 1  and the heterogeneous data set as 2  , respectively. Then, the screening problem of training data is transformed into the following binary hypothesis test From Figure 3, as observed, the points of training data with different distributions are separated and those with the same distribution are closer together. Similarly, the points of the homogeneous samples are close, whereas the outliers are far away from the homogeneous samples. The results imply that, while ignoring the signal power, the underlying geometry of the data (usually discarded by the traditional method in Euclidean space) can be better revealed by the manifold. These properties may be exploited to select homogeneous samples and CCM estimation in heterogeneous environments.

The Screening of Training Data
We assume that the statistical characteristics of data in a presumed range cell (maybe the cell adjacent to CUT and several guard cells are set) are the same as the statistical characteristics of CUT, and denote the CCM of the presumed cell as R 0 . Thus, the training data sharing the same distribution can be selected by a binary classification problem. We denote the homogeneous data set as Q 1 and the heterogeneous data set as Q 2 , respectively. Then, the screening problem of training data is transformed into the following binary hypothesis test: The decision rules for problem (13) are based on the Euclidean distance d between different points after the manifold learning algorithm, that is: The threshold η can be set by guaranteeing a given probability p e of classifying data as heterogeneous when they are homogeneous. In our method, the threshold is obtained through 100p −1 e independent Monte-Carlo trails. After the screening of training data, the number of homogeneous samples is usually far less than the DoF. Therefore, the clutter suppression with limited samples is investigated in the next section.

Clutter Suppression Based on Matrix Manifolds
Although the operation of selected training data removes the influence of heterogeneous samples, the number of available samples is also reduced. In this section, we propose a novel clutter suppression method for STAP where the CCM estimation problem is converted into the geometric centroid optimization on manifolds. To obtain a more accurate and robust centroid, criteria are proposed to evaluate the dissimilarity metrics, combining the practical needs. Moreover, a specific scheme of clutter suppression on manifolds is given.

Clutter Covariance Matrix Estimation on Matrix Manifolds
As stated before, the space of covariance matrices is nonlinear, but the existing methods in Euclidean space search the estimated CCM in a single linear space constraint (as shown in (8)). Meanwhile, the manifold is nonlinear and shows advantages to reflect the underlying geometry of the data. Therefore, when generalizing the constraint to a manifold, the estimation problem can be performed directly on the nonlinear search space. As such, the CCM estimation in our scenario is recast as the following unconstrained optimization problem: where d q P ( R k ,R) represents the dissimilarity metric on manifolds. When q = 1, the matrix R is the geometric median; and q = 2,R is the geometric mean. Herein, we dub the centroid of points along the manifold surface as the geometric centroid and the centroid of points along with the straight-line distance as the arithmetic centroid. For a more intuitive Remote Sens. 2021, 13, 3195 8 of 18 illustration, the difference between the geometric centroid and the arithmetic centroid is illustrated in Figure 4.
represents the dissimilarity metric on manifolds. When 1 q = , the matrix R is the geometric median; and 2 q = , R is the geometric mean. Herein, we dub the centroid of points along the manifold surface as the geometric centroid and the centroid of points along with the straight-line distance as the arithmetic centroid. For a more intuitive illustration, the difference between the geometric centroid and the arithmetic centroid is illustrated in Figure 4.
can directly be the distance in (12), however, it is not an easy task to derive closed-form expressions for the distance (at least to the best of the authors' knowledge). Therefore, the crucial question of how to measure the distance and dissimilarity of different points on manifolds efficiently must be addressed. A natural choice for the dissimilarity metric to solve (15) is the affine invariant Riemannian metric This metric is an important dissimilarity metric but is computationally cumbersome [36]. Due to the complexity of STAP, the Riemannian metric is inappropriate, and it is imperative to have a good metric for approximating distances on generalized surfaces. The approximating metric directly affects the extraction and utilization of the structure of the data; thus, different scenarios may have a different optimal metric. For example, maximizing the discriminative features between the target and background is the key point for target detection, while for background estimation, it is important to extract the common features from the contaminated data. Therefore, it is necessary to combine this with practical application to obtain appropriate dissimilarity metrics.

Criteria for Dissimilarity Metric on Manifolds
Four typical types of dissimilarity metrics: the arithmetic, Riemannian distance, Kullback-Leibler, and total skew Jensen divergence [37] (TSJD, belonging to Bregman divergences [38]) are considered to obtain an appropriate metric for our problem. Based on these metrics, we investigate which criterion should be used to decide which metric to exploit in our scenario. First, precise estimation with small sample support is required in heterogeneous environments due to the limited homogeneous samples. Second, it is important for our method to be statistically robust to outliers. Third, a huge amount of computation is a tricky issue with large array elements. Therefore, combining these practical The dissimilarity metric d q P ( R k ,R) can directly be the distance in (12), however, it is not an easy task to derive closed-form expressions for the distance (at least to the best of the authors' knowledge). Therefore, the crucial question of how to measure the distance and dissimilarity of different points on manifolds efficiently must be addressed. A natural choice for the dissimilarity metric to solve (15) is the affine invariant Riemannian metric This metric is an important dissimilarity metric but is computationally cumbersome [36]. Due to the complexity of STAP, the Riemannian metric is inappropriate, and it is imperative to have a good metric for approximating distances on generalized surfaces. The approximating metric directly affects the extraction and utilization of the structure of the data; thus, different scenarios may have a different optimal metric. For example, maximizing the discriminative features between the target and background is the key point for target detection, while for background estimation, it is important to extract the common features from the contaminated data. Therefore, it is necessary to combine this with practical application to obtain appropriate dissimilarity metrics.

Criteria for Dissimilarity Metric on Manifolds
Four typical types of dissimilarity metrics: the arithmetic, Riemannian distance, Kullback-Leibler, and total skew Jensen divergence [37] (TSJD, belonging to Bregman divergences [38]) are considered to obtain an appropriate metric for our problem. Based on these metrics, we investigate which criterion should be used to decide which metric to exploit in our scenario. First, precise estimation with small sample support is required in heterogeneous environments due to the limited homogeneous samples. Second, it is important for our method to be statistically robust to outliers. Third, a huge amount of computation is a tricky issue with large array elements. Therefore, combining these practical needs, good dissimilarity metrics should be precise with small sample support, be robust, and have a low complexity. Namely, the IF, robustness to corrupted training data, and computational efficiency are proposed as the criteria to evaluate the different dissimilarity metrics.
As stated before, Euclidean tools are not adapted for HPD matrices. The distance between R andR in (16) is employed to assess the property of robustness, where R andR represent the estimated CCM with and without outliers in the training data, respectively. Since the estimated matrix may have no explicit closed-form expression under the aforementioned dissimilarity metrics herein, we try to conduct a quantitative comparison of IF and robustness properties in the simulated experiment. A standard STAP system that employs 8 array elements and 10 pulses is considered. Assume that the clutter data obey the K distribution, and the clutter to noise ratio CNR is set to 40 dB. As we are interested in heterogeneous environments, the IF and robustness performance are discussed under the condition of small sample support in the simulation. Therefore, the number of training data is set to 32, and an outlier is located at the 16th range bin when considering the property of robustness. This means that, according to (6) and (16), the IF and the distance between R andR can be calculated. Meanwhile, for the computational cost, if the metric has a closed-form, it will be regarded as having low complexity. Then, the properties of the aforementioned metrics are summarized in Table 1.  Table 1 indicates that the arithmetic method has a relatively poor IF and statistical robustness performance in small sample support. KL is a good way to measure the dissimilarity due to its low complexity; however, it is easily biased by the chosen coordinate and has a poor IF performance. The TSJD provides the desired invariance to coordinate, is robust, has a relatively high IF, and has a low computational cost compared with the Riemannian metric. Therefore, TSJD is selected as the dissimilarity metric.
To summarize, a specific scheme for the clutter suppression on manifolds is shown in Figure 5. The process of constructing the covariance matrix of the sample from a single sample itself has the ability to reduce the number of samples required. However, the reduction in the number of samples required in the proposed method is not only produced by the covariance matrix construction of the received data but is also related to the data structure on manifolds. A detailed simulation will be conducted in the subsequent simulation. needs, good dissimilarity metrics should be precise with small sample support, be robust, and have a low complexity. Namely, the IF, robustness to corrupted training data, and computational efficiency are proposed as the criteria to evaluate the different dissimilarity metrics.
As stated before, Euclidean tools are not adapted for HPD matrices. The distance between R and R in (16) is employed to assess the property of robustness, where R and R represent the estimated CCM with and without outliers in the training data, respectively. Since the estimated matrix may have no explicit closed-form expression under the aforementioned dissimilarity metrics herein, we try to conduct a quantitative comparison of IF and robustness properties in the simulated experiment. A standard STAP system that employs 8 array elements and 10 pulses is considered. Assume that the clutter data obey the K distribution, and the clutter to noise ratio CNR is set to 40 dB. As we are interested in heterogeneous environments, the IF and robustness performance are discussed under the condition of small sample support in the simulation. Therefore, the number of training data is set to 32, and an outlier is located at the 16th range bin when considering the property of robustness. This means that, according to (6) and (16), the IF and the distance between R and R can be calculated. Meanwhile, for the computational cost, if the metric has a closed-form, it will be regarded as having low complexity. Then, the properties of the aforementioned metrics are summarized in Table 1.  Table 1 indicates that the arithmetic method has a relatively poor IF and statistical robustness performance in small sample support. KL is a good way to measure the dissimilarity due to its low complexity; however, it is easily biased by the chosen coordinate and has a poor IF performance. The TSJD provides the desired invariance to coordinate, is robust, has a relatively high IF, and has a low computational cost compared with the Riemannian metric. Therefore, TSJD is selected as the dissimilarity metric.
To summarize, a specific scheme for the clutter suppression on manifolds is shown in Figure 5. The process of constructing the covariance matrix of the sample from a single sample itself has the ability to reduce the number of samples required. However, the reduction in the number of samples required in the proposed method is not only produced by the covariance matrix construction of the received data but is also related to the data structure on manifolds. A detailed simulation will be conducted in the subsequent simulation.  As observed in Figure 5, the CCM is constructed using the received data in each bin first. Based on the CCM, the manifold is established, and then homogeneous samples are selected according to a binary hypothesis test. Afterward, the TSJD is exploited as the geometric dissimilarity metric. Thus, the t iteration of the estimated CCM iŝ where α(0 < α < 1) represents the skew factor, and represents the updated renormalized weights at stage t ρ(R (t) , R k ) is given by and F denotes the loss function. ∇F is the differential of F. Generally, the square loss function, logarithms, and exponentials are the typical loss functions. Among them, the square loss function is the maximum likelihood estimation of parameters. Moreover, the logarithms and exponentials are sensitive to noise when compared with the square loss function. Therefore, in this paper, we choose the square loss function, namely, F(x) = x 2 . Once the CCMR is obtained, the adaptive weights for clutter suppression are calculated by (7).

Experimental Results and Analysis
In this section, experiments on simulated and real data demonstrate the performance of the proposed method. Both competing conventional methods in Euclidean space and other geometric methods based on manifolds are discussed. We denote the arithmetic mean performed on the covariance matrix construction in (9) as the arithmetic method. Then, the LSMI, 3DT, arithmetic, EASTR [18], random matrix improved (RMI) [30], Karcher [32], and m-estimators [39] are compared to illustrate the potential of the proposed method. To compare under fair conditions, the initialization matrix of the methods that need iteration is set to be the identity matrix.

Simulated Data
First, experiments conducted on the simulated data are shown. The main radar parameters of the simulation are listed in Table 2. The Monte Carlo technique is exploited to test the performance and 500 independent Monte Carlo trials are run. For the training data selection experiment, the range bin 25th is assumed to be the CUT, and the data in set k = {10, · · · , 41} ∈ Q 1 are homogenous samples with CUT, which obey a K distribution. Meanwhile, the data in the set k = {[1, · · · , 9], [42, · · · , 50]} ∈ Q 2 are heterogeneous samples, which obey a normal distribution. Herein, the CUT is target free and the CCM of CUT can be viewed as R 0 . The probability p e of classifying data as heterogeneous when they are homogeneous is set to be 10 −4 . Using the probability p e , the threshold is obtained. The comparison result of homogeneous training data selection is shown in Figure 6. It is seen that the homogeneous samples can be selected effectively by the proposed method. Meanwhile, since the heterogeneous training data with CUT are the data that obey different distributions, rather than the data with different power, the GIP test fails to obtain a satisfactory performance.  [1, , 9], [42, ,50] k  = ∈   are heterogeneous samples, which obey a normal distribution. Herein, the CUT is target free and the CCM of CUT can be viewed as 0 R . The probability e p of classifying data as heterogeneous when they are homogeneous is set to be 4 10 -. Using the probability e p , the threshold is obtained. The comparison result of homogeneous training data selection is shown in Figure 6. It is seen that the homogeneous samples can be selected effectively by the proposed method. Meanwhile, since the heterogeneous training data with CUT are the data that obey different distributions, rather than the data with different power, the GIP test fails to obtain a satisfactory performance.
(a) (b) With the selected training data (the number of homogeneous samples in 1  is 32), the IF performance of different methods in limited samples is depicted in Figure 7. The higher IF is, the larger the output SCNR is, and the wider the notch result, the poorer the low Doppler shift signal detection. As seen in Figure 7, with small sample support, the IF of the proposed method (dubbed as the geometric method) has a higher gain and sharp notch and is greatly improved compared with the typical methods in Euclidean space (the LSMI, 3DT, and EASTR method). Furthermore, the other geometric methods (the Karcher, m-estimator, and RMI method) are generally better than typical methods in Euclidean space.  With the selected training data (the number of homogeneous samples in Q 1 is 32), the IF performance of different methods in limited samples is depicted in Figure 7. The higher IF is, the larger the output SCNR is, and the wider the notch result, the poorer the low Doppler shift signal detection. As seen in Figure 7, with small sample support, the IF of the proposed method (dubbed as the geometric method) has a higher gain and sharp notch and is greatly improved compared with the typical methods in Euclidean space (the LSMI, 3DT, and EASTR method). Furthermore, the other geometric methods (the Karcher, m-estimator, and RMI method) are generally better than typical methods in Euclidean space.  [1, , 9], [42, ,50] k  = ∈   are heterogeneous samples, which obey a normal distribution. Herein, the CUT is target free and the CCM of CUT can be viewed as 0 R . The probability e p of classifying data as heterogeneous when they are homogeneous is set to be 4 10 -. Using the probability e p , the threshold is obtained. The comparison result of homogeneous training data selection is shown in Figure 6. It is seen that the homogeneous samples can be selected effectively by the proposed method. Meanwhile, since the heterogeneous training data with CUT are the data that obey different distributions, rather than the data with different power, the GIP test fails to obtain a satisfactory performance.
(a) (b) With the selected training data (the number of homogeneous samples in 1  is 32), the IF performance of different methods in limited samples is depicted in Figure 7. The higher IF is, the larger the output SCNR is, and the wider the notch result, the poorer the low Doppler shift signal detection. As seen in Figure 7, with small sample support, the IF of the proposed method (dubbed as the geometric method) has a higher gain and sharp notch and is greatly improved compared with the typical methods in Euclidean space (the LSMI, 3DT, and EASTR method). Furthermore, the other geometric methods (the Karcher, m-estimator, and RMI method) are generally better than typical methods in Euclidean space.  To illustrate the impact of homogeneous training data on the proposed method, the output SCNR concerning the number of training data is provided in Figure 8. The comparison results indicate that the EASTR method has an unsatisfactory performance with limited training data. In addition, when the number of samples is less than 60, the proposed method is better than other methods. Afterward, with the increase in samples, the performance of each method gradually approaches. The SCNR of the arithmetic method is always 2 dB lower than that of the proposed method. The reason fewer training samples are needed for the proposed method is due to the geometric metric on manifolds. This is because different elements of the CCM play different roles in the clutter matrix estimation, and the main diagonal occupies a more important position. When we use the Frobenius norm in the Euclidean space to measure the dissimilarity, each element is treated equally. However, with the geometric metrics on manifolds, this is equivalent to exploiting embedding mapping (similar to weighted, but not linear) with different elements, indicating that different elements couple together and play different roles in the estimation. We can take the Riemannian metric as an example to illustrate this further. Denote a set of HPD matrices and compute all the pairwise distances The matrices R i , R j are coupled together. If we rearrange the elements of the matrices, the distances will be different. Hence, in the case of small sample support, the metric on manifolds allows us to obtain a more accurate CCM by embedding mapping. Then, under the same conditions, the number of samples required for the estimation on manifolds can be lower. This verifies that the superior performance of the proposed method is not only produced by the construction of CCM in (9), but also the geometric metric on manifolds. To illustrate the impact of homogeneous training data on the proposed method, the output SCNR concerning the number of training data is provided in Figure 8. The comparison results indicate that the EASTR method has an unsatisfactory performance with limited training data. In addition, when the number of samples is less than 60, the proposed method is better than other methods. Afterward, with the increase in samples, the performance of each method gradually approaches. The SCNR of the arithmetic method is always 2 dB lower than that of the proposed method. The reason fewer training samples are needed for the proposed method is due to the geometric metric on manifolds. This is because different elements of the CCM play different roles in the clutter matrix estimation, and the main diagonal occupies a more important position. When we use the Frobenius norm in the Euclidean space to measure the dissimilarity, each element is treated equally. However, with the geometric metrics on manifolds, this is equivalent to exploiting embedding mapping (similar to weighted, but not linear) with different elements, indicating that different elements couple together and play different roles in the estimation. We can take the Riemannian metric as an example to illustrate this further. Denote a set of HPD matrices and compute all the pairwise distances The matrices i R , j R are coupled together. If we rearrange the elements of the matrices, the distances will be different. Hence, in the case of small sample support, the metric on manifolds allows us to obtain a more accurate CCM by embedding mapping. Then, under the same conditions, the number of samples required for the estimation on manifolds can be lower. This verifies that the superior performance of the proposed method is not only produced by the construction of CCM in (9), but also the geometric metric on manifolds. In addition to the clutter suppression performance, the moving target detection performance is also discussed. We assume there to be a weak target in strong clutter environments, and that the presumed target is located at the 240th range bin with a velocity of 4.27 m/s and a spatial angle of 89°. The spatiotemporal spectrum is depicted in Figure 9. The detection probability performances of adaptive matched filter (AMF) detectors in limited samples are evaluated via the Monte Carlo technique. The threshold value in the AMF detector is set according to [40]. Figure 10  , and the input SCNR varies from −20 to 0 dB. As observed, Figure 10 demonstrates that the proposed method achieves better performance than others. In addition to the clutter suppression performance, the moving target detection performance is also discussed. We assume there to be a weak target in strong clutter environments, and that the presumed target is located at the 240th range bin with a velocity of 4.27 m/s and a spatial angle of 89 • . The spatiotemporal spectrum is depicted in Figure 9. The detection probability performances of adaptive matched filter (AMF) detectors in limited samples are evaluated via the Monte Carlo technique. The threshold value in the AMF detector is set according to [40]. Figure 10 provides the detection probability p d with respect to input SCNR, where p f a = 10 −3 , and the input SCNR varies from −20 to 0 dB. As observed, Figure 10 demonstrates that the proposed method achieves better performance than others.
The influences of the number of pulses and array elements on the performances for different methods are depicted in Figure 11a,b, respectively. As observed, the IF of the proposed method is better than the others at first. Afterward, as the number of array elements and pulses increases, the IF of all methods tends to be consistent. Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 18 Figure 9. Spatiotemporal spectrum. The influences of the number of pulses and array elements on the performances for different methods are depicted in Figure 11a,b, respectively. As observed, the IF of the proposed method is better than the others at first. Afterward, as the number of array elements and pulses increases, the IF of all methods tends to be consistent.   The influences of the number of pulses and array elements on the performances for different methods are depicted in Figure 11a,b, respectively. As observed, the IF of the proposed method is better than the others at first. Afterward, as the number of array elements and pulses increases, the IF of all methods tends to be consistent.   The influences of the number of pulses and array elements on the performances for different methods are depicted in Figure 11a,b, respectively. As observed, the IF of the proposed method is better than the others at first. Afterward, as the number of array elements and pulses increases, the IF of all methods tends to be consistent.

Real Data
To further demonstrate the effectiveness of the proposed method, experiments are performed on the real data file t38pre01v1.mat in the Mountain-Top data set [41,42]. There are 14 antenna elements, 16 coherent pulses, and 403 range bins. A target is located in the 147th range bin with a normalized Doppler frequency of 0.25 and a spatial angle of −15 • . Considering the limited samples scenario, we use the data from 114 to 172 range bins as dataset 1. As the data are relatively homogeneous, three outliers are added to the range bins from 114 to 116 to demonstrate our proposed method. Therefore, the data in the set k = {[120, · · · , 145], [152, · · · , 172]} ∈ Q 1 are the homogenous samples, correspondingly, the data in the set k = {114, 115, 116} ∈ Q 2 are heterogeneous samples. The result of the homogeneous training data selection is shown in Figure 12. As observed, the outliers can be selected due to their large distance. Figure 13 illustrates the output power of different methods after applying the space-time adaptive filter. This indicates that all the algorithms can detect the target effectively. However, the clutter residual at the other range bins for the proposed method is weaker than the others.

Real Data
To further demonstrate the effectiveness of the proposed method, experiments are performed on the real data file t38pre01v1.mat in the Mountain-Top data set [41,42]. There are 14 antenna elements, 16 coherent pulses, and 403 range bins. A target is located in the 147th range bin with a normalized Doppler frequency of 0.25 and a spatial angle of −15°. Considering the limited samples scenario, we use the data from 114 to 172 range bins as dataset 1. As the data are relatively homogeneous, three outliers are added to the range bins from 114 to 116 to demonstrate our proposed method. Therefore, the data in the set  Figure 12. As observed, the outliers can be selected due to their large distance. Figure 13 illustrates the output power of different methods after applying the space-time adaptive filter. This indicates that all the algorithms can detect the target effectively. However, the clutter residual at the other range bins for the proposed method is weaker than the others.  To sufficiently demonstrate the improvement produced by the proposed method in the case of limited samples, the data with a presumed target are used as dataset 2. In dataset 2, the presumed target is injected in the range bin, and the data of range bins 66 to 118 are exploited as training samples. The normalized Doppler frequency and spatial angle of the target are also set to 0.25 and −15°, respectively. To perform a quantitative comparison, the averaged output SCNR of the selected range bins (excluding eight range bins   Figure 12. As observed, the outliers can be selected due to their large distance. Figure 13 illustrates the output power of different methods after applying the space-time adaptive filter. This indicates that all the algorithms can detect the target effectively. However, the clutter residual at the other range bins for the proposed method is weaker than the others.  To sufficiently demonstrate the improvement produced by the proposed method in the case of limited samples, the data with a presumed target are used as dataset 2. In dataset 2, the presumed target is injected in the range bin, and the data of range bins 66 to 118 are exploited as training samples. The normalized Doppler frequency and spatial angle of the target are also set to 0.25 and −15°, respectively. To perform a quantitative comparison, the averaged output SCNR of the selected range bins (excluding eight range bins To sufficiently demonstrate the improvement produced by the proposed method in the case of limited samples, the data with a presumed target are used as dataset 2. In dataset 2, the presumed target is injected in the range bin, and the data of range bins 66 to 118 are exploited as training samples. The normalized Doppler frequency and spatial angle of the target are also set to 0.25 and −15 • , respectively. To perform a quantitative comparison, the averaged output SCNR of the selected range bins (excluding eight range bins around the target) is calculated, where the output SCNR denotes the ratio of the target output power to the averaged output power of selected range bins. Specific values are reported in Table 3. The results achieved with real data are consistent with those achieved with simulated data.

Influence Analysis of Outliers
Based on the previous subsection, the proposed method appears to be suitable for situations with limited samples. Generally, if there are contaminated training data, the clutter suppression performance will be greatly degraded. Therefore, to illustrate the impact of outliers, the simulation with outliers is performed using different methods. We assumed that the main parameters are consistent with those in Table 2. The training data number is set to be 32, and there are two outliers with high power in the training data. The normalized Doppler frequency and the azimuth of outliers are −0.3 and 90 • , respectively. We evaluate the performance by comparing the SCNR loss (SCNR loss ), which is defined by The SCNR loss against the power with two outliers in the training data is shown in Figure 14. As observed, the SCNR loss of the LSMI, 3DT, EASTR, and RMI methods is clearly increased in the presence of outliers, while the SCNR loss of the arithmetic, Karcher, m-estimator, and the proposed method is almost unchanged. Compared with the case without outliers, the influence of the arithmetic, Karcher, m-estimator, and the proposed method is less than 0.5 dB, whereas that of RMI method is about 15 dB. Hence, the RMI method is easily affected by outliers. Figure 15 is presented to help assess the SCNR loss against the number of outliers. The proposed method achieves better performance when the number of outliers increases slightly. The proposed method is superior to the other methods in the presence of outliers. The reason for this can be deduced by the influence function. If there are outliers, Q j , j ∈ (1, 2, · · · , n) in the training data, the influence function H(Q) of the metric of the proposed method (TSJD) can be calculated by where c is a constant. Hence, the dissimilarity metric is robust to outliers, which allows for its robust performance in heterogeneous environments.
where c is a constant. Hence, the dissimilarity metric is robust to outliers, which allows for its robust performance in heterogeneous environments.

Conclusions
This paper focuses on a heterogeneous clutter suppression problem for an airborne radar system. The training data are characterized on the Riemannian manifold of HPD matrices. Then, it is revealed that the training data with different distributions with the same power are separated, while those with the same distribution are close. With these properties, the heterogeneous samples can be eliminated. After that, combining these practical needs, an improved clutter suppression method based on manifolds is proposed where c is a constant. Hence, the dissimilarity metric is robust to outliers, which allows for its robust performance in heterogeneous environments.

Conclusions
This paper focuses on a heterogeneous clutter suppression problem for an airborne radar system. The training data are characterized on the Riemannian manifold of HPD matrices. Then, it is revealed that the training data with different distributions with the same power are separated, while those with the same distribution are close. With these properties, the heterogeneous samples can be eliminated. After that, combining these practical needs, an improved clutter suppression method based on manifolds is proposed

Conclusions
This paper focuses on a heterogeneous clutter suppression problem for an airborne radar system. The training data are characterized on the Riemannian manifold of HPD matrices. Then, it is revealed that the training data with different distributions with the same power are separated, while those with the same distribution are close. With these properties, the heterogeneous samples can be eliminated. After that, combining these practical needs, an improved clutter suppression method based on manifolds is proposed with the selected limited samples. The results show that by taking advantage of the geometric dissimilarity on manifolds, the requirement for training samples for clutter suppression can be reduced. Afterward, the performances of clutter suppression, target detection, and the influences of outliers (discrete clutter) are investigated to evaluate the potential of the proposed method. The experiments show that the proposed method is not only competitive in the case of limited samples but also has a good robustness to discrete clutter. In possible future research, it is worth studying the geometric metric optimization on manifolds. Additional research direction may concern the fast algorithm for geometric barycenter to estimate covariance matrix.