Extensions of Some Statistical Concepts to the Complex Domain

: This paper deals with the extension of principal component analysis, canonical correlation analysis, the Cramer–Rao inequality, and a few other statistical concepts in the real domain to the corresponding complex domain. Optimizations of Hermitian forms under a linear constraint, a bilinear form under Hermitian-form constraints, and similar maxima/minima problems in the complex domain are discussed. Some vector/matrix differential operators are developed to handle the above types of problems. These operators in the complex domain and the optimization problems in the complex domain are believed to be new and novel. These operators will also be useful in maximum likelihood estimation problems, which will be illustrated in the concluding remarks. Detailed steps are given in the derivations so that the methods are easily accessible to everyone.


Introduction
In most textbooks on statistics, scalar/vector/matrix-variate random variables in the complex domain and the corresponding statistical analysis in the complex domain, are not discussed.But, in a large number of physical situations, it is natural or more convenient to represent variables in the complex domain.Hence, statistical techniques in the complex domain are required for data analysis in these situations.In the physical science and engineering literature, there are a number of papers dealing with random variables in the complex domain.Data reduction techniques such as principal component analysis in the complex domain seem to be the main topic in these areas.Most of the papers in these applied areas, concentrate on developing algorithms for computing eigenvalues and eigenvectors, which are useful and relevant in principal component analysis, independent component analysis, factor analysis, and so on.Statistical analysis in the complex domain is widely used in the analysis of multi-look return signals in radar [1], in multi-task learning in artificial intelligence and machine learning [2], in problems such as signal processing [3], in principal component analysis and independent component analysis in analyzing meteorological data in the complex domain [4], in optimal allocation of resources, especially energy resources [5], in holography, microscopy and optical metrology [6], in delayed mixing in speech processing, in biomedical signal analysis, and in financial data modeling, etc. [7].
In the present paper, vector/matrix differential operators in the complex domain are defined.Then, these are applied in optimizing a Hermitian form under a linear constraint, a bilinear form under Hermitian-form constraints, etc.Then, as applications of these optimization problems, the real-domain techniques of principal component analysis and canonical correlation analysis are extended to the complex domain.Other statistical concepts such as the Cramer-Rao inequality, least square procedure, and related aspects are extended to the complex domain.Detailed derivations are given so that the methods will be accessible even to beginners.
The following notation is used in this paper: Scalar variables, whether mathematical or random, are denoted by lower-case letters such as x, y.Vector/matrix variables are denoted by capital letters such as X, Y. Scalar constants are denoted by a, b, etc., and vector, matrix constants by A, B, etc.The wedge product of the differentials dx and dy is defined as dx ∧ dy = −dy ∧ dx, where x and y are two real scalar variables, so that dx ∧ dx = 0, dy ∧ dy = 0. Let X = (x jk ) be an m × n matrix with distinct real scalar variables x jk as elements, then dX = ∧ m j=1 ∧ n k=1 dx jk .The transpose of a matrix X is denoted by a prime as X ′ .For a p × p matrix X, if X = X ′ (symmetric), then dX = ∧ j≥k dx jk = ∧ j≤k dx jk .Variables in the complex domain are written with a tilde, such as x, ỹ, X, Ỹ.If X is a p × q matrix in the complex domain, then X = X 1 + iX 2 , i = (−1), X 1 , X 2 are real, then d X is defined as d X = dX 1 ∧ dX 2 .The determinant of a real p × p matrix Y is written as |Y| or as det(Y) and if Ỹ is in the complex domain, then the absolute value of the determinant of Ỹ is written as |det(Y)|.Also, tr(Y) means the trace of the square matrix Y.A p × p real matrix A being positive definite is written as A > O, and X = X * > O indicates that the matrix X, in the complex domain, is Hermitian positive definite.Other notation is explained when it occurs for the first time.
This paper is organized as follows: Section 2 starts with defining a vector differential operator in the complex domain.Then, with the help of this operator, some optimization problems such as optimizing a linear form subject to Hermitian-form constraint, a Hermitian form under Hermitian-form constraint, and a bilinear form with Hermitian-form constraints are discussed.Then, a matrix differential operator in the complex domain is defined.Differentiations of the trace of a product of matrices and determinant of a matrix, by using the matrix differential operator in the complex domain, are dealt with.Section 3 deals with the extension of principal component analysis to the complex domain.Section 4 delves into the extension of canonical correlation analysis to the complex domain.Section 5 examines the extension of the Cramer-Rao inequality, Cauchy-Schwarz inequality, and the least square estimation procedure to the complex domain.Detailed steps are given in each situation.

Optimization Involving Linear Forms, Traces, and Determinants
Here, we will consider linear forms and their differentiations first, and then, we will consider some situations of optimizing a Hermitian form with linear constraint and optimization of a linear form with Hermitian-form constraint.We consider some basic results to start with.Let where A 1 , A 2 , X 1 , X 2 are real vectors.Assume that A is a non-null arbitrary coefficient vector and X is a known vector random variable in the complex domain.
Theorem 1.For A and X as defined above, Proof.This can be easily seen from the following: ũ from the corresponding real variable case.That is, Now, we consider a slightly more general result.Consider ũ = A * Σ 11 X, where A c means the complex conjugate of A.
Theorem 2. For X and A as defined above, let ũ = A * Σ 11 X.Let Σ 11 be a p × p matrix , free of the elements in A and X.Then, Proof.Since Σ 11 and X are free of the elements in A, we may take Then, from Theorem 1 the result follows.
Now, consider a p × 1 vector A and a p × p matrix Σ and the Hermitian form ũ = A * ΣA in the complex domain where Σ = Σ * > O is free of the elements in A. Then, we have the following result: Theorem 3.For the Hermitian form as defined above, where Proof.Opening up ũ, we have the following: Since Σ = Σ * , we have Then, from the results in the real case, we have the following: Then, from the real case, from the corresponding real case.Then, from ( 2)-( 5), This establishes the result.

Optimization of a Linear Form Subject to Hermitian-Form Constraint
Consider a linear form ũ = A * X, where A and X are p × 1 non-null vectors, with A being arbitrary, and X being a known vector variable with variance Var( ũ) = A * Σ 11 A, where Σ 11 = Σ * 11 > O is the covariance matrix in X.Consider the optimization of ũ, subject to the constraint that the variance of ũ is fixed, say unity, that is, A * Σ 11 A = 1.Then, we have the following result: Theorem 4. For the linear form ũ = A * X and the constraint A * Σ 11 A = 1, as defined above, max , where λ is a Lagrangian multiplier.Observe that Σ 11 = Σ * 11 and we assume that it is Hermitian positive definite.From previous results (1) and (2), ∂ w That is, But from (9), λ = A * X which means that the maximum of our linear form is the largest λ and the minimum of our linear form is the smallest λ.But, from (9) and (10), λλ c = |λ| 2 = X * Σ −1 11 X, where |λ| means the absolute value of λ and λ c is the complex conjugate of λ. max being positive definite Hermitian form.This completes the proof.

Optimization of a Hermitian Form Subject to Linear Constraint
Now, consider a problem of optimizing A * Σ 11 A subject to A * X = α fixed, where A and X are p × 1 vectors and Σ 11 = Σ * 11 > O is a p × p Hermitian positive definite matrix, free of the elements of A. Then, we have the following result: Theorem 5.For the Hermitian form A * Σ 11 A and linear form constraint A * X = α, we have max (11) Proof.Let λ be a Lagrangian multiplier and consider w = A * Σ 11 A − λ(A * X − α).From ( 1) and ( 2), we have From (12), because X * Σ −1 11 X is Hermitian positive definite, where |λ| and |α| denote the absolute values of λ and α, respectively.Then, max[ X * A X] = +∞ since A is arbitrary and since the linear restriction cannot eliminate the effect of A fully.Hence, we look for the minimum.From (13), This completes the proof.

Differentiation of Traces of Matrix Products and Matrix Differential Operator
Now, we consider a few matrix-variate cases, where also the problem is essentially optimization involving vector variable situations.Let A = (a jk ) be a p × q matrix of arbitrary elements a jk .Let X = ( xjk ) be a p × q matrix in the complex domain with distinct scalar complex variables xjk as elements.Consider ũ = tr(A * X).Let A j and Xj be the j-th columns of A and X, respectively.Then, ũ = ∑ q j=1 A * j Xj .Let A j = A j1 + iA j2 , i = (−1), A j1 , A j2 are real p × 1 vectors.Let ũj = A * j Xj .Consider the operator ) and then, since we are taking partial derivatives.Then, we have the following result: Theorem 6.Let ũ, A, A j , Xj be as defined above.Then, Proof.From the previous result, . Now, stack up these side by side as q columns for j = 1, ..., q.Note that A = [A 1 , ..., A q ], ∂ ∂A = [ ∂ ∂A 1 , ..., ∂ ∂A q ], and ∂ ũ ∂A j = 2 Xj , and hence, ∂ ũ ∂A = 2 X.This completes the proof.
For the next result to be established, we need a result on matrices, which will be stated here as a lemma.Lemma 1.For two p × p real matrices A and B, where A = A ′ (symmetric) and B = −B ′ (skew symmetric), tr(A ′ B) = 0.
Proof.For two p × p matrices A 1 and B 1 , it is well known that tr(A 1 ) = tr(A ′ 1 ) and tr(A 1 B 1 ) = tr(B 1 A 1 ).Then, for our matrices A and For the next problem, let A = A * be a p × p Hermitian matrix of arbitrary elements a jk .Let X = X * be a Hermitian matrix in the complex domain with distinct complex scalar variables as elements except for the Hermitian property.
. Then, we have the following result: , and ũ = tr(A * X) be as defined above.Then, where diag( X) means a diagonal matrix consisting of only the diagonal elements from X.
Proof.Opening up ũ, we have the following: since tr(A ′ 1 X 2 ) = 0, tr(A ′ 2 X 1 ) = 0 by Lemma 1. Now, from the known result for symmetric matrices in the real case, we have since the diagonal elements in A 2 and X 2 are already zeros.Combining (15) and (16), which corresponds to the result in the real case.

Differentiation of a Determinant in the Complex Domain
Here, we will start with defining the derivative of a scalar quantity with respect to a matrix or we will define a matrix differential operator first.Let X = (x jk ) be an m × n real matrix and let X = ( xjk ) be an m × n matrix in the complex domain.Then, we can always write X = X 1 + iX 2 , where i = (−1), X 1 , X 2 are real m × n matrices.Then, matrix differential operators ∂ ∂X and ∂ ∂ X will be defined as the following: ).
Consider a p × p nonsingular matrix X in the complex domain and let | X| be its determinant.Then, we have the following result: Theorem 8.For the p × p nonsingular matrix X in the complex domain, Proof.The cofactor expansion of a determinant holds in the real and complex domains and it is the following: where Cjk is the cofactor of xjk for all j and k.Let C = ( Cjk ) be the matrix of cofactors.For a general matrix, all xjk 's are distinct and the corresponding Cjk 's are also distinct.Hence, in the real case, taking the partial derivative of the j-th line in (17) we have ∂ ∂x jk [|X|] = C jk for all j and k.Hence, in the real case This is a known result.But, in the complex case, the situation is different.Before we tackle the complex-domain situation, we will develop some necessary tools.It is a known result that X−1 = 1 | X| C′ .But when X = X * (Hermitian), the eigenvalues are real, and hence, the determinant is real.Then, X−1 is Hermitian, thereby C is also Hermitian.The following results will be helpful when we apply our matrix differential operators to scalar functions of matrices in the complex domain.For two scalar complex variables x and ỹ the following can be easily verified.
For convenience, let us consider the term where xjk = x 1jk + ix 2jk , Cjk = C 1jk + iC 2jk , with x 1jk , x 2jk , C 1jk , C 2jk being real scalar quantities.Note that xjk = x * kj when X is Hermitian.Hence, for example, when we differentiate with respect to x21 it is equivalent to differentiating with respect to x * 12 and vice versa.When x * 12 C * 12 is differentiated with respect to x12 it is equivalent to differentiating x21 C21 with respect to x12 , and so on.Then, This is the derivative of the complex conjugate of the (j, k)-th element in (18) for all j and k when X is a general matrix with distinct scalar complex variables as elements.Then, When X = X * (Hermitian) we have Then, the result in (20) holds for the k-th term in the j-th line in (18).But, when X is Hermitian, there is one more element contributing to x 1jk and x 2jk .This is the j-th term in the k-th line of (18).Thus, the sum of the contributions coming from these two terms is the derivative of | Xc | with respect to xjk .The sum of the contributions is the following, observing that for all j ̸ = k.When j = k, the diagonal elements in X and C are real, and hence, the term occurs only once, and therefore, Since we are taking the partial derivative, the derivatives are the same as differentiation of Here, we have used two properties that for a p × p matrix B, |B| = |B ′ | and (B c ) ′ = B * = B if B is Hermitian.Then, we have the following theorem: Theorem 9. When the p × p nonsingular matrix X in the complex domain is Hermitian, that is, In the following sections, we will consider some applications of the results obtained in Section 2.

Principal Component Analysis in the Complex Domain
This is an application of the mathematical problem of optimization of a Hermitian form under a Hermitian-form constraint.
In many physical situations, variables occur in pairs, such as time and phase, and hence, the most appropriate representation of such variables is through complex variables because a scalar complex variable can be taken as a pair of real variables.Let x = x 1 + ix 2 , i = (−1), x 1 , x 2 are real scalar variables, be a scalar complex variable.Then, a statistical density associated with x is a real-valued scalar function f ( x) of x such that f ( x) ≥ 0 over the entire complex plane, something like a hill on the complex plane, so that the total volume under the surface is unity, that is, x f ( x)d x = 1, where d x = dx 1 ∧ dx 2 , the wedge product of the differentials dx 1 and dx 2 , respectively.Then, the center of gravity of the hill f ( x) is at E[ x] = x x f ( x)d x and the square of the measure of scatter in x is where E[(•)] means the expected value when x is a scalar complex random variable and f ( x) is its density.
If the scatter is small, then the variable x is concentrated near the center of gravity E[ x].If σ 2 is large, then x is spread thin far and wide, and hence, it is more or less unrecognizable.If a large number of scalar variables are being considered as possible variables to be included into a model, then the ones with the larger scatter are the important variables to be included into the model.For convenience, one can consider linear functions of such variables because linear functions also contain individual variables.A linear function is of the form ũ = a * 1 x1 + ... + a * p xk , where a * 1 , ..., a * p are constants with x1 , ..., xp being scalar complex variables.For example, a * 2 = 0 = ... = a * p = 0 gives x1 , where a * indicates the conjugate transpose and for a scalar quantity y, ỹ * = ỹc only, where c in the exponent indicates the complex conjugate.We may write the linear function as ũ = A * X, where where a prime indicates the transpose.The expected value of and the variance-covariance matrix or the covariance matrix in X is denoted by Σ > O (Hermitian positive definite).Then, the variance of ũ, denoted by Var( ũ), is given by Var The most important linear function is that linear function having the maximum variance.Hence, we may compute max A [A * ΣA].But, since Σ > O we have a positive definite Hermitian form in A * ΣA and its maximum over A is at +∞.Thus, unrestricted maximization does not make sense.Without loss of generality we may take A * A = 1 because this can always be achieved for any non-null A.
Then, the maximization amounts to maximizing A * ΣA within the unit sphere There will be a maximum and a minimum in this case.We may incorporate the restriction by using a Lagrangian multiplier.Consider where λ is a Lagrangian multiplier.Maximization will be achieved by using the following result, which will be stated as a lemma. Then, Similarly, where we have used two properties.When 1 Y 1 because both are 1 × 1 real, and hence, each is equal to its transpose, and therefore, the difference is zero.When because both are 1 × 1 real, and hence, each is equal to its transpose.Then, from the real operator operating on a real linear form the result follows.Similar results hold when differentiating with respect to the operator ∂ ∂Y 2 . Thus, we have the following: Therefore, Hence, the result.For B = I, the identity matrix, the result on Ỹ * Ỹ follows, that is, . Now, by using Lemma 1 we can differentiate w in (22).That is, where |(•)| means the determinant of the square matrix (•).From (23), ΣA = λA and pre-multiplying by A * and using the fact that [A * ΣA] = λ 1 and min where λ 1 is the largest eigenvalue of Σ = Σ * > O and λ p is the smallest eigenvalue of Σ.When Σ is Hermitian, all its eigenvalues are real and when it is Hermitian positive definite, all its eigenvalues are real positive also.Hence, the procedure is the following: Take the largest eigenvalue of Σ, say λ 1 .Then, through (23), compute an eigenvector corresponding to λ 1 , that is, solve ΣA = λ 1 A for an A. Then, normalize this eigenvector through This A 1 is the normalized eigenvector corresponding to λ 1 .Now, consider ũ1 = A * 1 X.This ũ1 is the first principal component in the sense of the linear function having the maximum variance.Now, take the second largest eigenvalue λ 2 .Go through the same procedure and construct the normalized eigenvector A 2 corresponding to λ 2 .Then, ũ2 = A * 2 X is the second principal component.Continue the process and stop the process when the variance of ũj , namely, λ j , falls below a preassigned number.If there is no preassigned number, then ũ1 , ..., ũp will be the p principal components.Here, we have assumed that the eigenvalues of Σ are distinct.When the eigenvalues are distinct, we can show that the eigenvectors corresponding to the distinct eigenvalues of a symmetric or Hermitian matrix are orthogonal to each other.Hence, our principal components will be orthogonal to each other in the sense that the joint dispersion in the pair ( ũi , ũj ) is 0 for i ̸ = j or the covariance between ũi and ũj is zero when i ̸ = j.The covariance is defined as the following: Let Ũ be a p × 1 vector in the complex domain and let Ṽ be a q × 1 vector in the complex domain.Then, the covariance of Ũ on Ṽ is defined and denoted as Cov( Ũ, Ṽ) = E[( Ũ − E( Ũ))( Ṽ − E( Ṽ)) * ], whenever this expected value exists, so that when Ṽ = Ũ, then Cov( Ũ, Ṽ) = Cov( Ũ), the covariance matrix in Ũ, and when p = 1, it is the variance of the scalar complex variable ũ.
When the covariance matrix Σ is unknown, then we may construct sample principal components.Let our population be the p × 1 vector X, X′ = [ x1 , ..., xp ], where xj , j = 1, ..., p are distinct scalar complex variables.Consider n independently and identically distributed (iid) such p-vectors.Then, we have a simple random sample of size n from X.Then, the sample matrix is the p × n, n > p matrix, denoted as the following: Let the sample average be denoted by X = 1 n [ X1 + ... + Xn ] and the matrix of sample averages be denoted by the bold letter X = [ X, ..., X].Then, the sample sum of products matrix S is given by S where sjk = ∑ n r=1 ( xjr − xj )( xrk − xk ) * .The motivation in using the sample sum of products matrix S is that 1  n−1 S is an unbiased estimator of Σ.Since we will be normalizing the eigenvectors, operate with S itself.Compute the eigenvalues of S. Take the largest eigenvalue of S. Call it m 1 .Construct an eigenvector corresponding to m 1 and normalize it through M * 1 M 1 = 1, where M 1 is the normalized eigenvector corresponding to m 1 .Then, ṽ1 = M * 1 X is the first sample principal component.When the columns of the sample matrix are not linearly related then we have S = S * > O (Hermitian positive definite) and all eigenvalues m 1 , ..., m p will be positive.We assume that the eigenvalues are distinct m 1 > m 2 > ... > m p .This will be true almost surely.Now, take m 2 and construct M 2 and the second principal component ṽ2 = M * 2 X and continue the process.We can show that the covariances between ṽj and ṽk will be zeros for all j ̸ = k.This property follows from the fact that when the matrix is symmetric or Hermitian, the eigenvectors corresponding to distinct eigenvalues are orthogonal.When the population X is p-variate complex Gaussian, then we can show that S will be a complex Wishart distributed with degrees of freedom n − 1 and parameter matrix Σ.The distributions of the largest, smallest, and j-th largest eigenvalues and the corresponding eigenvectors of S in the complex domain are given in [8].

Canonical Correlation Analysis in the Complex Domain
This is an application of the mathematical problem of optimization of a bilinear form in the complex domain, under two Hermitian-form constraints.The following application is regarding the prediction of one set of variables by using another set of variables.Consider two sets of scalar complex variables S 1 = { x1 , ..., xp } and S 2 = { ỹ1 , ..., ỹq } where p need not be equal to q.Consider the appended vector where Σ 11 = Cov( X), Σ 22 = Cov( Ỹ), Σ 12 = Cov( X, Ỹ), and Σ * 12 = Σ 21 , and vice-versa.That is, Σ is the covariance matrix in Z, Σ 11 is the covariance matrix in X, Σ 22 is the covariance matrix in Ỹ, and so on.Also, Our aim is to predict the variables in the set S 1 by using the variables in the set S 2 and vice-versa, and obtain the "best" predictors; "best" in the sense of having the maximum joint dispersion.In order to represent each set S 1 and S 2 , we will take arbitrary linear functions of the variables in each set.Consider linear functions ũ = A * X and ṽ = B * Ỹ, where , where X and Ỹ are listed above already.Since a j and b j are scalar constant quantities, a * j = a c j , j = 1, ..., p and b * j = b c j , j = 1, ..., q.Variances for linear functions are already seen in Section 2. Therefore, Var( ũ) = A * Σ 11 A, Var( ṽ) = B * Σ 22 B and Cov( ũ, ṽ) = A * Σ 12 B, Σ 21 = Σ * 12 , B * Σ 21 A = Cov( ṽ, ũ).Here, Σ 12 and Σ 21 can be taken as measures of joint dispersion or joint variation between X and Ỹ and A * Σ 12 B = Cov( ũ, ṽ) as the joint dispersion between ũ and ṽ.As a criterion for the "best" predictor of ũ by using ṽ and vice versa, we may take that the pair ũ and ṽ have the maximum joint variation.The best predictor of ũ by using ṽ as that pair having the maximum A * Σ 12 B and the best predictor of ṽ by using ũ as that pair having the maximum value for B * Σ 21 A. Since covariances depend upon the units of measurements of the variables involved, we may take a scale-free covariance by taking ρ = Cov( ũ, ṽ) .
Further, as explained in Section 2, without loss of generality we may take A * Σ 11 A = 1 and B * Σ 22 B = 1 or confine the bilinear form (hyperboloid) within unit positive definite Hermitian forms (ellipsoids) in order to prevent them from going to +∞.Hence, our procedure simplifies to optimizing A * Σ 12 B subject to the conditions A * Σ 11 A = 1,B * Σ 22 B = 1 and computing that pair of A and B which will maximize A * Σ 12 B.As before, we may use the Lagrangian multipliers λ 1 and λ 2 and consider the function In order to optimize this w we need one result on differentiation of a bilinear form, which will be stated as a lemma.
Lemma 3. Let X = X 1 + iX 2 , i = (−1), X 1 , X 2 are real, p × 1 vectors of distinct real scalar variables x 1j and x 2j , respectively, where xj = x j1 + ix j2 , j = 1, ..., p.Let Ỹ = Y 1 + iY 2 , i = (−1), Y 1 , Y 2 are real, q × 1 vectors of distinct real scalar variables y j1 and y j2 , respectively, where ỹj = y j1 + iy j2 , j = 1, ..., q, where x j1 , x j2 , y j1 , y j2 are real.Let the partial differential operators be as defined in Section 2, namely, ) and similar operators involving Proof.Opening up X * A Ỹ we have the following: Then, from the known results in the real case, we have the following: irrespective of whether A is real or in the complex domain.Then, This completes the proof.Now, differentiating w in (25), we have the following: Now, premultiply (26) by A * and (27) by B * to obtain the following, observing that . Take B from (27) and substitute in (26) to obtain the following: This shows that , where the matrix is p × p. From symmetry, it follows that µ is also an eigenvalue of Σ −1 22 Σ 21 Σ −1 11 Σ 12 , where the matrix is q × q.Hence, all the nonzero µ's are common to both of these matrices.Hence, the procedure is the following: If p ≤ q, then compute the eigenvalues of Σ −1 11 Σ 12 Σ −1 22 Σ 21 , otherwise, compute the eigenvalues of the other matrix Σ −1 22 Σ 21 Σ −1 11 Σ 12 , both will give the same nonzero eigenvalues.Let µ 1 be the largest and µ r be the smallest nonzero eigenvalues.Then, we have the results max and min Then, the procedure is the following: If p ≤ q, then compute all the eigenvalues of Σ −1 11 Σ 12 Σ −1 22 Σ 21 .Let the largest eigenvalue be µ 1 .Then, compute one eigenvector corresponding to µ 1 .Use the equation A 11 .Then, compute ũ1 = A * 1 X.Then, use the same eigenvalue µ 1 and compute one eigenvector from the equation ) is the first pair of canonical variables in the sense ũ1 is the best predictor of ṽ1 and vice-versa.Now, take the second largest eigenvalue µ 2 of Σ −1 11 Σ 12 Σ −1 22 Σ 21 .Then, compute one eigenvector corresponding to µ 2 , that is, solve the equation A 21 .Now, compute ũ2 = A * 2 X.Use the same µ 2 and solve for B from the equation ) is the second pair of canonical variables.Continue the process until µ j falls below a preassigned limit.If there is no such preassigned limit, then compute all the pairs, that is, p if p ≤ q; otherwise q.If q < p, then start with the computation of the eigenvalues of Σ −1 22 Σ 21 Σ −1 11 Σ 12 and proceed parallel to the steps used in the case p ≤ q.Observe that the symmetric format of 11 .This form is also available from the same starting equation (28).The symmetric format can always be written in the form C * C for some matrix C, and hence, the symmetric form is either Hermitian positive definite or Hermitian positive semi-definite, and therefore, all the nonzero eigenvalues are positive.Let us assume that the eigenvalues µ 1 , µ 2 , ... are distinct, It is a known result that eigenvectors corresponding to distinct eigenvalues of Hermitian or symmetric matrices are orthogonal to each other.Hence, ũ1 , ũ2 , ... are non-correlated.Similarly, ṽ1 , ṽ2 , ... are non-correlated.
If Σ 11 , Σ 12 , Σ 22 are not available, then take a simple random sample of size n, n > p + q, from Z = X Ỹ .Then, compute the sample sum of products matrix: as for the case of principal component analysis.Now, continue with Sjk 's as with Σ jk 's.Then, we will obtain the pairs of sample canonical variables.Some distributional aspects of sample canonical variables and the sample canonical correlation matrix Ũ = S− 1 2 11 S12 S−1 22 S21 S− 1 2 11 are discussed in [8].Note that when S 11 is 1 × 1, then we have the square of the multiple correlation coefficient in Ũ = ũ, which is scalar.In this case, our starting set S 1 will have only one complex scalar variable and the set S 2 will have q variables.Here, the problem is to predict one variable in S 1 by using the variables in S 2 .The exact distributions of the canonical correlation matrix Ũ and the square of the absolute value of the multiple correlation coefficient ũ are available in explicit forms in [8].

Covariance and Correlation in the Complex Domain
Consider scalar complex variables first.Let x1 and ỹ1 be two scalar complex variables or scalar variables in the complex domain.
where, for example, x * = xc , with * indicating the conjugate transpose and c indicating the conjugate only.Here, we have only scalar variables, and hence, the complex conjugate transpose is only complex conjugate.For convenience, we have used the notation σ 11 , σ 22 , σ 12 .Then, σ 21 = E[ ỹ x * ] = E[ ỹ xc ] = Cov( ỹ, x).For a scalar complex variable x = x 11 + ix 12 , x * = x ′ 11 − ix ′ 12 = x 11 − ix 12 = xc .Let us examine the variances of sum and difference.Let Then, This can be simplified as the following: Therefore, we may define the correlation coefficient in the complex domain, denoted by r, as the following: where ℜ(•) denotes the real part in (•).Also, note that | x ỹ * | 2 = | x| 2 | ỹ| 2 .This will motivate us to examine the dot product of two vectors in the complex domain and the Cauchy-Schwarz inequality in the complex domain.

The Cauchy-Schwarz Inequality in the Complex Domain
Let X, X′ = [ x1 , ..., xp ], and Ỹ, Ỹ′ = [ ỹ1 , ..., ỹp ] be two p × 1 vectors in the complex domain.We will define the dot product between X and Ỹ, and it will be denoted and defined as the following: X ⊙ Ỹ = X * Ỹ and Thus, the Cauchy-Schwarz inequality holds for the complex domain also.

Minimum-Variance Unbiased Estimators in the Complex Domain
In the class of linear estimators A * X for the parametric function g(θ), which linear function is the minimum-variance unbiased estimator for g(θ)?
and Var(A * X) = A * ΣA, E[ X] = μ where Σ is the covariance matrix in X.Our aim here is to minimize A * ΣA subject to the constraint A * μ = g(θ) (given).Let λ be a Lagrangian multiplier and let w = A * ΣA − λ(A * μ − g(θ)).Then, That is, A * ΣA = λA * μ = λg(θ).Hence, g(θ) multiplied by the minimum value of λ gives the minimum of A * ΣA.From (34), is the minimum value of the variance of our linear estimator.Hence, the minimum-variance unbiased estimator is The first inequality follows from the inequality established in Section 5.1.

Cramer-Rao-Type Inequality in the Complex Domain
Let x1 , ..., xn be a simple random sample from the population designated by the density f ( xj ), where xj , j = 1, ..., n are independently and identically distributed (iid).Then, the joint density L = ∏ n j=1 f ( xj ).Let T( x1 , ..., xn ), denoted as T( X), be a statistic with the expected value g(θ).That is, E[T] = X TL d X = g(θ).Differentiating with respect to θ, we have where we have assumed that the support of X is free of θ and differentiation inside the integral is valid.But, from the total integral being one, we have From (35) and (36), we have E[T( ∂ ln L ∂θ )] = Cov(T, ∂ ln L ∂θ ) because for any two scalar random variables u and v real, or ũ and ṽ in the complex domain, Cov(u and the corresponding results in the complex domain also hold.Therefore, since This shows that the Cramer-Rao-type inequality holds in the complex domain also.

Least Square Estimation in Linear Models in the Complex Domain
Let us examine whether the least square procedure, in the class of linear models, holds in the complex domain also.Let x1 , ..., xk be preassigned complex numbers or observations on k random variables in the complex domain.Let ỹ be a scalar complex variable.Then, a linear model of x1 , ..., xk for predicting ỹ, can be of the following form: But, in order to predict ỹ by using linear predictors we must know the conditional distribution of ỹ, given x1 , ..., xk , and also the conditional expectation must be linear in x1 , ..., xk .
If the conditional distribution is not known, then we may use a distribution-free procedure.One such procedure is the estimation procedure by using the least square method.In this method, we set up a corresponding model of the following form for the j-th observation on ỹ, namely, ỹj , for j = 1, ..., n, where n > k + 1 is the sample size.
corresponding to the linear model in the real case, where ϵ j is the random part or the sum total contributions coming from unknown factors, corresponding to ỹj .Then, if we sum up the observations and divide by n we obtain the sample averages ȳ, xr , r = 1, ..., k, where, for example, xr = 1 n ∑ n j=1 xrj .Then, from (39), we have We have taken the error sum as zero without much loss of generality.Since a 0 is available from (40), we may rewrite (39) as follows: We may write all the equations in (41 Note that Ũ is an n × 1 matrix, Z is an n × k matrix, β is a k × 1 matrix, and ẽ is an n × 1 matrix.Then, the sum of squares of the absolute values of the errors is the following: In the least square procedure, we minimize this error sum of squares of the absolute values of the errors, and then, estimate the parameter vector β.Note that ( Ũ − Zβ) * = Ũ * − β * Z * and where we have assumed that Z * Z is a nonsingular matrix because xrj 's are preassigned numbers, and hence, Z can be taken as a full-rank matrix with rank k < n.Then, the estimated β, as per the least square estimate, again denoted by β, is β = ( Z * Z) −1 Z * Ũ and the estimated model for ỹ is a c 0 + β * X, X′ = [ x1 , ..., xk ], β * = Ũ * Z( Z * Z) −1 or the estimated ỹ is ỹ = a c 0 + β * X = a c 0 + Ũ * Z( Z * Z) −1 X (44 where β is available from (43) and a c 0 = ȳ − β * X, X = [ x1 , ..., xk ] ′ .This shows that the least square procedure in the complex domain also runs parallel to that in the real domain.If ẽ is assumed to have an n-variate complex Gaussian distribution, then the inference problems also runs parallel to those in the real domain.

Concluding Remarks
In this paper, we have introduced vector/matrix differential operators in the complex domain.These differential operators in the complex domain are believed to be new.With the help of these operators, we have examined the optimization of a linear form with Hermitian-form constraint, optimization of a Hermitian form with linear form as well as Hermitian-form constraint, and optimization of a bilinear form with Hermitian-form constraints, where the linear forms and bilinear forms involve vectors and matrices in the complex domain.As applications of these optimization problems, we have extended principal component analysis and canonical correlation analysis to the complex domain.Also extended to the complex domain are the Cramer-Rao inequality, the Cauchy-Schwarz inequality, minimum-variance unbiased estimation, and least square analysis.If we use the general definition of a density f (X) as a real-valued scalar function such that f (X) ≥ 0 in the domain of X and X f (X)dX = 1, where the argument X may be scalar or vector or matrix or a sequence of matrices in the real or complex domains [8], then the structures of the joint density, marginal density, conditional density, etc., will be parallel to those in the real domain.Then, we will be able to extend Bayesian analysis to the complex domain.One can also explore extending other multivariate statistical techniques such as factor analysis, classification problems, cluster analysis, analysis of variance, analysis of covariance, etc., to the complex domain.These are some of the open problems.Since the likelihood function L is a product of densities at the observed sample point, in the simple random sample case, this L will be a real-valued scalar function.Then, one can extend the maximum likelihood method of estimation to the complex domain.For example, in the p-variate complex Gaussian case, in the case of a simple random sample of size n, we have n S. One can also examine the maximum likelihood estimation involving other scalar/vector/matrix-variate densities in the complex domain.The above are some of the open problems.