High-Dimensional Random Matrices from the Classical Matrix Groups , and Generalized Hypergeometric Functions of Matrix Argument

Results from the theory of the generalized hypergeometric functions of matrix argument, and the related zonal polynomials, are used to develop a new approach to study the asymptotic distributions of linear functions of uniformly distributed random matrices from the classical compact matrix groups. In particular, we provide a new approach for proving the following result of D’Aristotile, Diaconis, and Newman: Let the random matrix Hn be uniformly distributed according to Haar measure on the group of n × n orthogonal matrices, and let An be a non-random n × n real matrix such that tr (AnAn) = 1. Then, as n →∞, √ n tr AnHn converges in distribution to the standard normal distribution.


Introduction
The study of high-dimensional random orthogonal and unitary matrices can be traced to a famous paper of E. Borel [1] in which the following result is proved: Let X 1,n denote the first coordinate of X n , a n-dimensional random vector that is uniformly distributed on the unit sphere S n−1 ; then, as n → ∞, the random variable √ nX 1,n converges in distribution to Z, a standard normal random variable.
In a survey of the literature, we were especially intrigued by a result of D'Aristotile, Diaconis and Newman [9].We denote by O(n) the group of n×n orthogonal matrices, and by the uniform distribution on O(n) we mean the Haar measure, normalized to be a probability distribution.Further, we let N (0, 1) denote the standard normal distribution.Then the result is as follows: Theorem 1.1.(D'Aristotile et al. [9]) Let {A n : n = 1, 2, 3, . ..} be a sequence of real matrices such that A n is n × n and tr (A n A n ) = 1, and let H n be a random orthogonal matrix that is uniformly distributed on O(n).Then √ n tr (A n H n ) L → N (0, 1) as n → ∞.
The proof given by D'Aristotile, et al. [9] is based on classical probabilistic methods involving tightness.Their result was later studied by Meckes [12] who obtained a bound on the distance, in the total variation metric on the set of probability distributions, between the distribution of √ n tr (A n H n ) and the standard normal distribution; as a consequence, Meckes obtained an explicit formula for the rate of convergence to normality.
It was particularly striking to us that, throughout the existing literature on high-dimensional random matrices from the classical compact matrix groups, the theory of generalized hypergeometric functions of matrix argument appears not to have played an explicit role.We found this absence intriguing because it has been known since the work of Herz [15] that the characteristic function of a uniformly distributed random orthogonal matrix can be expressed in terms of the Bessel functions of matrix argument; indeed, a primary motivation for the invention of those Bessel functions was the study of random matrices which are uniformly distributed on O(n).
In this paper, we provide a heuristic derivation of Theorem 1.1.To that end, we will present crucial features of the theory of the zonal polynomials and a generalized hypergeometric function of matrix argument as necessary to make the paper self-contained.It is also noteworthy that the approach given here applies with ease, mutatis mutandis, to cases in which the matrix H n is uniformly distributed on the unitary group or the symplectic group, and to cases in which H n is a rectangular random matrix on Stiefel manifolds corresponding to the classical compact matrix groups.In short, the theory of the generalized hypergeometric functions of matrix argument lends itself readily to the study of linear functions of high-dimensional random matrices from the classical compact matrix groups.
Conversely, the study of high-dimensional orthogonal and unitary matrices also yields new results for the Bessel functions of matrix argument.By application of a result of Johansson [6], we will obtain an upper bound on the distance, in the supremum norm on R, between a certain generalized hypergeometric function of scalar matrix argument and the Gaussian quantity, exp(−t 2 /2), t ∈ R.

Zonal Polynomials and a Generalized Hypergeometric Function of Matrix Argument
Throughout the paper, we denote the determinant and trace of a square matrix A by det(A) and tr (A), respectively.We also denote by I n the identity matrix of order n.We denote by E the generic operation of expectation with respect to a probability distribution which, on all occasions, will be explicit from the context.
A partition is a vector κ = (κ 1 , . . ., κ n ) of non-negative integers that are weakly decreasing: The entries κ 1 , . . ., κ n are called the parts of κ; the length of µ is the number of non-zero κ j ; and the weight of κ is |κ| The set of partitions may be ordered lexicographically: If λ = (λ 1 , . . ., λ n ) and κ = (κ 1 , . . ., κ n ) are partitions then we write λ < κ if λ j < κ j for the first index j such that corresponding parts are unequal.
We shall encounter in the sequel the quantity, Perhaps coincidentally, the term ρ κ has appeared before now in the theory of zonal polynomials.James [16], in proving that the zonal polynomial Z κ is an eigenfunction of the Laplace-Beltrami operator on the cone of positive definite matrices, shows that ρ κ appears in the expression for the corresponding eigenvalue; see also Muirhead [17] (p.229, Equation ( 5)) and Richards [18].We will also need the following monotonicity property of ρ κ .
Lemma 2.1.In the lexicographic ordering on the set of partitions of weight k, ρ κ is a strictly increasing function: ρ λ < ρ κ for λ < κ.In particular, Proof.We shall use induction on λ in the lexicographic ordering on the set of partitions.For the top two partitions, (k) and (k − 1, 1), we find that As inductive hypothesis, suppose that the result has been proved for all partitions from (k) down to a partition κ = (κ 1 , . . ., κ n ).Then the partition which is immediately below κ is of the form for some j and l with j < l.By comparing the jth and lth parts of λ we also find that, necessarily, By cancelling common terms in the sums that define ρ κ and ρ λ , we obtain We have seen already that κ j − 1 ≥ κ l + 1.Further, since the sequence {κ j } is weakly decreasing then the sequence {κ j − j} is strictly decreasing, and hence (κ j − j) − (κ l − l) > 0 for j < l.Therefore, we obtain ρ κ − ρ λ > 0; consequently, by induction, the strictly-increasing property holds for all partitions of weight k.
Because the set of partitions of weight k is totally ordered with respect to the lexicographic ordering, with minimal element (1 k ) = ( 1, . . ., 1 k ) and maximal element (k), it follows from the monotonicity property of ρ κ that for all partitions κ of weight k.Thus, we obtain Equation (2.2).
For a ∈ C and any nonnegative integer j, the rising factorial, (a) j is defined as Corresponding to each partition κ, the partitional rising factorial, (a) κ is defined as Let S be a real symmetric n×n matrix.For each partition κ, we denote by Z κ (S) the zonal polynomial of the matrix S. A complete description of the zonal polynomials may be obtained from James [19], Muirhead [17], or Gross and Richards [20].Noting that the present paper deals directly with aspects of integration over the orthogonal group O(n), we remark that a direct definition of the zonal polynomials may be obtained as follows: For any symmetric n × n matrix S, and for j = 1, . . ., n, denote by det j (S) the principal minor of order j of S. Let (2.5) be the power function corresponding to the partition κ.Denote by dH n the Haar measure on O(n), normalized to be a probability measure.Then Z κ (S), the zonal polynomial corresponding to the partition κ, may be defined by where the normalizing constants c κ are positive and are chosen uniquely so that Integral representations of the type given in Equation (2.6) have played a crucial role in earlier studies of central limit theorems for positive definite random matrices (Richards [22]).
We now introduce a generalized hypergeometric function of matrix argument.Let a ∈ C be such that −a + 1 2 (j − 1) is not a non-negative integer for all j = 1, . . ., n.For any symmetric n × n matrix S, we define a generalized hypergeometric function of matrix argument, where the inner summation is over all partitions κ = (κ 1 , . . ., κ n ) of weight k.By a result of Gross and Richards [20], Theorem 6.3, the series Equation (2.8) converges absolutely for all S. With i ≡ √ −1, it is a result of Herz [15], (p.423, see also James [19]) that for any n × n matrix A, there holds the integral formula, This result generalizes a well-known formula that expresses a classical Bessel function as an integral over the unit circle; for this reason, the function 0 F 1 (n/2; −S) also is viewed as a Bessel function of matrix argument.

The Case of the Stiefel Manifold
We regard this section as preparatory for the ensuing new approach to Theorem 1.1, for the method of hypergeometric functions of matrix argument very easily yields the high-dimensional asymptotic behavior of random matrices taking values in Stiefel manifolds.
Denote by V n,m the Stiefel manifold of all m-tuples of orthonormal n-dimensional vectors.As a homogeneous space, V n,m O(n)/O(n − m), hence is compact.An explicit description of the unique O(n)-invariant uniform distribution on V n,m is given by Herz [15].The following result is both a generalization of Borel's result for the unit sphere and an analog of Theorem 1.1 for the Stiefel manifold.Theorem 3.1.Let m be a fixed positive integer, and let {A n : n ≥ m} be a sequence of real matrices such that A n is n × m and tr (A n A n ) = 1.For each n ≥ m, let H n be a n × m random matrix that is uniformly distributed on V n,m .Then To see how this result is obtained, we apply Equation (2.9) to obtain, for t ∈ R, Because A n A n is an m × m matrix then, by Equation (2.6), Z κ (A n A n ) = 0 if κ has length greater than m; therefore, in this case, the zonal polynomial expansion involves partitions of length at most m only.By Equation (2.4), we obtain for any partition κ of weight k, Therefore, n k /(n/2) κ ∼ 2 k for large n, and so we obtain which establishes that √ n tr (A n H n ) converges in distribution to N (0, 1).
For general A n , the argument given above also leads to the conclusion, We deduce, by applying the standard Cramér-Wold device, that for large n the entries of the matrix √ nH n are asymptotically multivariate normally distributed with mean 0 and identity covariance matrix I n .We note also that a similar conclusion may be obtained for the results to follow.

The Case of the Orthogonal Group
We now present a new approach to Theorem 1.1.In this setting, A n is an n × n real matrix satisfying the condition tr A n A n = 1, and the random matrix H n ∈ O(n) is uniformly distributed.Then, for t ∈ R, we again apply Equation (2.9) to deduce that the characteristic function of the random variable On expanding the 0 F 1 function in a series of zonal polynomials, we obtain a generating function for the moments of the random variable On comparing the coefficients of like powers of t we deduce that, for k = 0, 1, 2, . . .
We now examine the asymptotic behavior of the kth moment of √ n tr (A n H n ) as n → ∞.For a partition κ of weight k, the same argument used at Equation (3.1) shows that Substituting this result into Equation (4.1), we obtain By a Taylor-Maclaurin expansion, we obtain as n → ∞, where is the quantity first encountered at Equation (2.1).On applying Equation (2.7), we obtain Hence, by applying Equation (2.2), we obtain an upper bound which is not dependent on n.Therefore, and we conclude that for fixed k, as n → ∞, where Z ∼ N (0, 1).Finally, we apply the moment problem (Loéve [23], p. 185) to deduce that √ n tr (A n H n ) converges in distribution to N (0, 1).
We remark that the condition tr (A n A n ) = 1 can be weakened to require only that tr (A n A n ) → 1, with a sufficiently fast rate of convergence, as n → ∞.
It is also interesting to discover that the study of high-dimensional random orthogonal matrices yields a new inequality for the generalized hypergeometric function, 0 F 1 , of scalar matrix argument.Proposition 4.1.There exist positive constants c and d such that, for all n ≥ 1 and t ∈ R, Proof.Define the random variable Y = tr H n , where H n is uniformly distributed on U(n).Denote by g Y and φ the probability density functions of Y and the N (0, 1) random variable, respectively.By Johansson [6], Theorem 3.7(b) there exist positive constants c and d such that The proof is complete.

The Case of the Unitary Group
As we noted in the introduction, the method used in Section 4 produces similar results in the case of the unitary and symplectic groups.We shall present the details in the unitary case; and as regards the symplectic case, which we leave to the reader, we note that necessary details on the zonal polynomials and generalized hypergeometric function may be obtained from the paper of Gross and Richards [20].
In the sequel, we denote by A * the adjoint of a complex matrix: A * = Ā .The analog of Theorem 1.1 in the unitary case, due to Meckes [12], is the following: Theorem 5.1.(Meckes [12]) Let {A n : n = 1, 2, 3, . ..} be a sequence of complex matrices such that A n is n × n and tr (A * n A n ) = 1 for all n.Let H n be a random unitary matrix which is uniformly distributed on U(n).Then √ 2n Re tr (A n H n ) L → N (0, 1) as n → ∞.
In this setting, we will need the analogs of the partitional rising factorial, the zonal polynomial, and the generalized hypergeometric function of matrix argument that pertain to the "complex" case; see James [19] or Gross and Richards [20,21].Specifically, the partitional rising factorial is now defined as where each (a) κ j is a classical rising factorial as defined in Equation (2.3); the zonal polynomial is defined for any Hermitian n × n matrix S as where the power function p κ is defined in Equation (2.5), and the normalizing constants cκ are positive and are chosen uniquely so that and for any a ∈ C such that −a + j − 1 is not a non-negative integer for all j = 1, . . ., n, the generalized hypergeometric function of matrix argument is defined as Similar to the orthogonal case, the characteristic function of the random variable where 0 F 1 is a generalized hypergeometric function of Hermitian matrix argument.By expanding the 0 F 1 function in a series of complex zonal polynomials, we obtain a generating function for the moments of the random variable √ 2n Re tr A n H n : By comparing powers of t, we deduce that, for k = 0, 1, 2, . . .We can prove by means of an argument similar to that given in the proof of Lemma 2.1 that the coefficients ρ κ are strictly increasing in the lexicographic ordering on the set of partitions of weight κ; therefore, so we obtain which is not dependent on n.Therefore, We conclude that for fixed k, as n → ∞, where Z ∼ N (0, 1).Finally, we apply the moment problem to deduce that √ 2n Re tr (A n H n ) converges in distribution to N (0, 1).
We can also obtain an upper bound on the difference between the 0 F 1 function of scalar matrix argument and the Gaussian quantity, exp(−t 2 /2).The proof is similar to that of Proposition 4.1 and rests on an inequality of Johansson [6], Theorem 2.6(b).Proposition 5.2.There exist positive constants c and d such that, for all n ≥ 1 and t ∈ R, 0 F 1 (n; −t 2 I n /2) − exp(−t 2 /2) ≤ cn −dn tr A n H n 2k = (2k)!k! 2 k n k |κ|=k Z κ (A n A n ) κ j − 2j + 1)