High-Dimensional Mahalanobis Distances of Complex Random Vectors

In this paper, we investigate the asymptotic distributions of two types of Mahalanobis distance (MD): leave-one-out MD and classical MD with both Gaussian- and non-Gaussian-distributed complex random vectors, when the sample size n and the dimension of variables p increase under a fixed ratio c=p/n→∞. We investigate the distributional properties of complex MD when the random samples are independent, but not necessarily identically distributed. Some results regarding the F-matrix F=S2−1S1—the product of a sample covariance matrix S1 (from the independent variable array (be(Zi)1×n) with the inverse of another covariance matrix S2 (from the independent variable array (Zj≠i)p×n)—are used to develop the asymptotic distributions of MDs. We generalize the F-matrix results so that the independence between the two components S1 and S2 of the F-matrix is not required.


Introduction
Mahalanobis distance (MD) is a fundamental statistic in multivariate analysis. It is used to measure the distance between two random vectors or the distance between a random vector and its center of distribution. MD has received wide attention since it was proposed by Mahalanobis [1] in the 1930's. After decades of development, MD has been applied as a distance metric in various research areas, including propensity score analysis, as well as applications in matching [2], classification and discriminant analysis [3]. In the former case, MD is used to calculate the propensity scores as a measurement of the difference between two objects. In the latter case, the differences among the clusters are investigated based on MD. The applications of MD are also related to multivariate calibration [4], psychological analysis [5] and the construction of multivariate process control charts [6]; it is a standard method to assess the similarity between the observations.
As an essential scale for distinguishing objects from each other, the robustness of MD is important for guaranteeing the accuracy of an analysis. Thus, the properties of MD have been investigated in the last decade. Gath and Hayes [7] investigated the bounds for extreme MDs. Dai et al. [8] derived a number of marginal moments of the sample MD and showed that it behaves unexpectedly when p is relatively larger than n. Dai and Holgersson [9] examined the marginal behavior of MDs and determined their density functions. With the distribution derived by Dai and Holgersson [9], one can set the inferential boundaries for the MD itself and acquire robust criteria. Beside the developments above, some attention has been paid to the Fisher matrix (F-matrix) and Beta matrix which can be considered as the generalizations of MD.
We provide the definition of the Fisher matrix (see [10]) which is essential here. The Fisher matrices, or simply F -matrices, are an ensemble of matrices with two components each. Let Z jk , j, k = 1, 2, . . . be either both real or both complex random variable arrays. For p ≥ 1 and n ≥ 1, let Z = Z jk : 1 ≤ j ≤ p, 1 ≤ k ≤ n = (Z ·1 , . . . , Z ·n ), z (i) = (z i : 1 ≤ i ≤ n, i = k) = Z ·i , be the arrays, with column vectors (Z ·k ). The z (i) corresponds to the ith observation in Z. This observation is isolated from the rest of the observations, so that z (i),i =k,(p×1) and Z p×(n−1) are two independent samples. We determine the following matrices: where * stands for complex conjugate transpose. These two matrices are both of size p × p.
Then the Fisher matrix (or F -matrix) is defined as Pillai [11], Pillai and Flury [12] and Bai et al. [13] have studied the spectral properties of the F-distribution. Johansson [14] investigated the central limit theorem (CLT) for the random Hermitian matrices, including the Gaussian unitary ensemble. Guionnet [15] established the CLT for the non-commutative functional of Gaussian large random matrices. Bai et al. [16] obtain the CLT for the linear spectral statistics of the Wigner matrix. Zheng [10] extends the moments of the F-matrix into non-Gaussian circumstances with the assumption that the two components S 1 and S 2 of an F-matrix F = S −1 2 S 1 are independent. Gallego et al. [17] extended the MD into the condition of a multidimensional Normal distribution and studied the properties of MD for this case.
However, to the best of our knowledge, the properties of MDs with non-Gaussian complex random variables have not been studied much in the literature. In this paper, we derive the CLT for an MD with complex random variables under a more general circumstance. The common restrictions on the random vector, such as independent and identically distributed (i.i.d.) for the random sample, are not required. The independence between the two components S 1 and S 2 of the F-matrix F = S −1 2 S 1 is relaxed. We investigate the distributional properties of complex MD without assuming normal distribution. The fourth moments of the MDs are allowed to be an arbitrary value. This paper is organized as follows. In Section 2, we introduce the basic definitions concerning different types of complex random vectors, their covariance matrix, and the corresponding MDs. In Section 3, the first moments of different types of MDs and their distribution properties are given. The connection of leave-one-out MD and classical MD is derived, and their respective asymptotic distributions under general assumptions are investigated. We end up this paper by giving some concluding remarks and discussions in Section 4.

Some Examples of Mahalanobis Distance in Signal Processing
MD has been applied in different research areas. We give two examples that are related to the applications of MD in signal processing. Example 1. We illustrate our idea by using an example. In Figure 1, each point corresponds to the real and comlex part of an MD. The circle in red is an inferential boundary with the radius calculated based on MDs. The points that are outside the boundary are considered to be signals, while the points that are lying inside the circle are detected as noise.
This example has been used in some research; for example, in evaluating the capacity of multiple-input, multiple-output (MIMO) wireless communication systems (see [18] for more details). Denote the number of inputs (or transmitters) and the number of outputs (or receivers) of the MIMO wireless communication system by n t and n r respectively, and assume that the channel coefficients are correlated at both the transmitter and the receiver ends. Then the MIMO channel can be represented by an n r × n t complex random matrix H with corresponding covariance matrices Σ r and Σ t . For most of the cases, Σ r and Σ t should be replaced by the sample covariance matrices S r and S t to represent the channel correlations at the receiver and transmitter ends, respectively. The information processed by this random channel is a random quantity which can be measured by MD. Example 2. Zhao et al. [19] use MD in a fuzzy clustering algorithm in order to reduce the effect of noise on image segmentation. They compare the boundaries of clusters calculated by both the Euclidean distance and MD. The boundaries calculated by the Euclidean distance are straight lines which misclassify the observations that diverge far from their cluster center, while the boundaries based on the MDs are curves that fit better with the covariance of the cluster. This example implies that the MD is more accurate with regards to the measure of dissimilarity for image segmentation. It can be extended to the application of clustering when the channel signals are complex random variables.

Preliminaries
In this section, some important definitions of complex random variables and related concepts are introduced, including the covariance matrix of a complex random vector and the corresponding MDs.
We first define the covariance matrix of a general complex random vector on an n × p dimension complex random variable z where p is the number of variables and n is the sample size of the dataset, as given in [20]: Definition 1. Let z j = (z 1 , . . . , z p ) ∈ C p , j = 1, . . . , n be a complex random vector with known mean E z j = µ z,j where z j = x j + iy j , i = √ −1 and x j , y j ∈ R p . Let Γ p×p be the covariance matrix and C p×p be the relation matrix of z j respectively. Then Γ p×p and C p×p are defined respectively as follows: where is the transpose.
Definition 1 shows a general presentation of the complex random variables without imposing any distributional assumption on z j . If we set the components of z j , the random variables x j and y j , as normally distributed, then we are in a more familiar context in the complex space: circular symmetricity. The circular symmetricity of a complex normal random variable is an assumption that is used for the standardized form of complex Gaussian-distributed random variables. The definition of circular symmetricity is the following.
, "Re" stands for the real part and "Im" stands for the imaginary part.
The circularly symmetric normally distributed complex random variable can be used in the circumstance of a standard normal distribution in real space to simplify the derivations on complex random variables. Based on this condition, we acquire a simplified probability density function (hereinafter p.d.f.) on a complex normal random vector which is presented in the following example.
Example 3. The circularly symmetric complex random vector z = (z 1 , . . . , z p ) ∈ C p with mean vector µ z = 0 and relation matrix C = 0 has the p.d.f. as follows: where Γ z is the covariance matrix of z and "|.|" is the determinant.
Based on Example 3, one can see the possibility of transforming the expression of a complex random vector into a form consisting of vector multiplication between a complex constant vector and a real random vector. Let z j be a complex random variable; then, the transformation between a complex random variable and its real random variables can be presented as follows: The transformation offers a different way of inspecting the complex random vector [21]. The complex random vector can be considered as the bivariate form of a real random vector pre-multiplied by a constant vector. This transformation offers another way to present the complex covariance matrix in the form of real random vectors. The idea is presented by the example as follows.
The covariance matrix Γ of a p-dimensional complex random vector can be represented in the form of real random vectors x and y, as follows:

MD on Complex Random Variables
In this section, we introduce two types of MDs given in Definitions 5 and 6. We start with the classical MD which is based on a known mean and a known covariance matrix. The definition is given as follows.
Definition 3. The MD of the complex random vector z j ∈ C p , j = 1, . . . , n with known mean µ and known covariance matrix Γ z is defined as follows: There are both real and complex parts for a complex random vector. Thus, we define the corresponding MDs of the two components in a complex random variable.
Definition 4. The MDs on the real part and imaginary part x j , y j ∈ R p , j = 1, . . . , n of a complex random vector z j ∈ C p , j = 1, . . . , n with known mean µ and known covariance matrix Γ .. are defined as follows: Under the definitions above, we can derive the distribution of the MDs on a complex random vector with known mean and covariance. We present the distribution as follows. Proof. For the proof of this proposition, the reader can refer to reference [22].
The result of Proposition 1 is employed here to derive the moments of the MDs on the real and complex parts below. Proposition 2. The first moment of D Γ z , z j , 0 defined in (1) is given as follows: Proof. The result follows Proposition 1. Theorem 1. Set the random variables x i and y i to be normally distributed, and the covariance of D(Γ xx , x i , 0) and D Γ yy , y i , 0 as given in (2) and (3). Then their distributions are given as follows: where ⊗ is the tensor product, Φ = E x j x j ⊗ y j y j and tr(.) stands for trace.
Proof. Proposition 1 shows D Γ yy , y j , 0 ∼ χ 2 p and D Γ xx , x j , 0 ∼ χ 2 p . In order to derive the covariance of these two MDs, we need to first derive their cross moments.
Thus, the covariance is given as follows: which completes the proof.
The results presented so far concern a complex random vector with known mean and known covariance matrix. In practice, the mean µ and covariance matrix Γ z are not available all the time. Thus, some alternative statistics such as sample meanz = n −1 ∑ n j=1 z j and sample covariance S z = n −1 ∑ n j=1 z j −z z j −z * are used as substitutions of the population mean and variance when building the MDs. We introduce the definitions of MDs with sample mean and sample covariance matrix in the following.

Definition 5.
Let z j ∈ C p , j = 1, . . . , n, be a complex normally distributed random sample. The MD on the complex random vector z j with sample meanz = n −1 ∑ n j=1 z j and sample covariance matrix S z = (n − 1) −1 ∑ n j=1 z j −z z j −z * is defined as follows: For the purpose of deriving further distributional properties, we give an alternative definition of an MD: the leave-one-out case. Leave-one-out here means that we remove one observation each time and use the rest of the observations for constructing the MD.

Definition 6.
Let the leave-one-out sample mean of a complex random vector bez (i) = (n − 1) −1 ∑ n j=1,j =i z j and the corresponding leave-one-out sample covariance matrix . The leave-one-out MD on the complex random vector z i is defined as follows: The advantage of leaving one observation out of the dataset is the achieved independence between the removed observation and the sample covariance matrix of the other observations. The similarity in structure between the estimated and the leave-one-out MDs can be explored in theorems on their distributions. The distribution of the leave-one-out MD is derived as in the following theorem. The proofs of this theorem and others are relegated to Appendix A, so that readers can perceive the results more smoothly. The distribution of a leave-one-out MD can be used to derive the distribution of an estimated MD. We show this in Theorems 3 and 4.
We present the main results of this paper in the following two theorems. The assumption of a normal distribution on the complex random variable is released. Instead, we introduce two more general assumptions. Assume: (i) The entries of complex random matrix Z n×p are independent complex random variables, but not necessarily identically distributed with mean 0 and variance 1. Let the 4th moment of the entries have arbitrary value β. The limiting ratio of their dimensions is p/n → c ∈ (0, 1). (ii) For any η, where I(.) is the indicator function. This assumption is a standard Linderberg-type condition that guarantees the convergence of the random variable without the assumption of identical distribution.

Theorem 3.
Under the assumptions (i) and (ii), set the 4th moment of the complex random vector E(z) 4 = β < ∞. Then the asymptotic distribution of D (i) in (5) is: By using the results above, we derive the asymptotic distribution of the estimated MD in (4).

Theorem 4.
The asymptotic distribution of D j in (4) is given as follows: where τ and υ are given in Theorem 3.
In the F-matrix F = S −1 2 S 1 = S −1 z j −z z j −z * , the two component S −1 and z j −z z j −z * are assumed to be independent. Theorem 3 is derived under this restriction, while the results in Theorem 4 extend the circumstance and release the assumption of independence in Theorem 3.

Summary and Conclusions
This paper defines different types of MDs on complex random vectors with either known or unknown mean and covariance matrix. The MDs' first moments and the distributions of MD with known mean and covariance matrix are derived. Further, some asymptotic distributions of the estimated and leave-one-out MDs under complex nonnormal distribution are investigated. We have relaxed several assumptions from our previous work [9]. The random variables in the MD are required to be independent but not necessarily identically distributed. The fourth moment of our random variables can be of arbitrary value. The independence between the two components S 2 and S 1 of the F-matrix F = S −1 2 S 1 can be ignored in our results. In conclusion, the MDs on complex random vectors are useful tools when dealing with complex random vectors in many situations, for example, robust analysis with signal processing [23,24]. The asymptotic properties of MDs can be used in some inferential studies for finance theory [25]. Further studies could develop upon the real and imaginary parts of MDs over a complex random sample.  Proof. Follow the setting in (4). Then Hardin and Rocke [26] shows that n −1 (n − 1) 2 D j with a scalar follows a Beta distribution Beta(p/2, (n − p − 1)/2).
Appendix A.2. Proof of Theorem 3 Proof. By using the Theorem 3.2 from [10], we can acquire the mean and covariance of the . Set β to be arbitrary. Then we receive For the variance, we have (1 + hξ 1 )(ξ 1 + h) So we obtain Proof. Regarding asymptotic distribution discussed, one may ignore the rank-1 matrix ZZ * in the definition of the sample covariance matrix and define the sample covariance matrix to be S = n −1 ∑ n j=1 z j −z z j −z * . This idea is given in [27]. Using the results from [9], set W = nS and W −i = (n − 1)S −i = ∑ n j=1,j =i y j y j * where y j = z j −z . Let D −i = y * i S −1 −i y i . Then |W −i + y i y * i | = |W −i | 1 + y * i W −1 −i y i and |W − y i y * i | = |W | 1 − y * i W −1 y i so that |W −i + y i y * i | |W −i | = 1 + (n − 1) −1 D −i and |W − y i y * i | |W | = 1 − n −1 D i . Hence 1 − cp −1 D i 1 + cp −1 D −i = 1. Now D −i is not independent of observation i, since X i is included in the sample mean vectorX, and hence does not fully represent the leave-oneout estimator of D i . However, using the identity (X i −X) = (n − 1) n X i −X (i) and substituting it into the above expressions, we find that 1 − n −1 D i (1+D (i) n −1 ) (1+D (i) n −2 ) = 1. This can be used to derive properties of D i as a function of D (i) , and vice versa. Note that D i is used in the proof to make the derivation easier to follow. This subscript can be replaced by D j in the theorem.