Closed-Form Representations of the Density Function and Integer Moments of the Sample Correlation Coefficient

This paper provides a simplified representation of the exact density function of R, the sample correlation coefficient. The odd and even moments of R are also obtained in closed forms. Being expressed in terms of generalized hypergeometric functions, the resulting representations are readily computable. Some numerical examples corroborate the validity of the results derived herein.


Introduction
Given {(X i , Y i ), i = 1, . . ., n}, a simple random sample of size n from a bivariate normal distribution, the sample correlation coefficient, where 2 /n , is the maximum likelihood estimator of ρ X,Y , Pearson's product-moment correlation coefficient.Fisher [1] obtained the following series representation of the density function of R: which converges for −1 < ρ r < 1 .
Closed-form representations of the exact density of R are derived in Section 2. They are given in terms of the generalized hypergeometric function, p F q (a 1 , . . ., a p ; b 1 , . . ., b q where, for example, (a 1 ) k = Γ(a 1 + k)/Γ(a 1 ).More specifically, it will be shown that the exact density of R can be expressed as for −1 < ρ r < 1, which simplifies to where denoting the beta function.For various results on the hypergeometric function 2 F 1 (a, b ; c, z) and its main properties, the reader is referred to Olver et al. [2], Chapter 15.Closed-form representations of the odd and even moments of R are provided in Section 3 and some numerical examples are included in Section 4.
Fisher's Z-transform is a well-known transformation of R whose associated approximate normal distribution is known to present some shortcomings, especially when the sample size is small and |ρ| is large, in which case the distribution of R is markedly skewed.Winterbottom [3] showed that the normal approximation requires large sample sizes to be valid.It is also known that, in the bivariate normal case, the asymptotic variance of Fisher's Z statistic does not depend on ρ.Furthermore, as pointed out by Hotelling [4], the variance of R changes with the mean.The density and moment expressions derived in this paper remain accurate for any values of ρ and n.

The Exact Density R
It should be noted that the series representation of the density function of R given in Equation (2) converges very slowly.It was indeed observed that, in certain instances, more than 1000 terms may be necessary to reach convergence.Closed-form representations of the exact density function of R are derived in this section.
First, we note that the identity, can be established by re-expressing the Legendre duplication formula, In order to prove that the representation of the density function of given in Equation ( 4) is equivalent to the series representation (2), it suffices to show that Now, letting k = 2j + 1, we establish that when k odd, which, in view of Equation (8), proves the result.
We now show that when k = 2i, The result is established by applying identity (7) wherein k is replaced by k − 1.Thus, one has the following closed-form representation of the exact density function of R: A simplified representation of this expression can be obtained by making use of the following identity listed under "Quadratic transformations with fixed a, b, z" on the Wolfram website, http://functions.wolfram.com/HypergeometricFunctions/Hypergeometric2F1/17/02/10/: which, on making the substitutions, Multiplying both sides by Hence, the following form of the exact density function of R: which, on letting k = n − 1 in Equation ( 6), gives Finally, the following representation of the density function of R is obtained on writing Incidentally, this expression is more compact than that proposed by Hotelling [4].

Closed Forms for the Moments of R
It is shown in this section that the moments of R can also be expressed in closed forms.The following moment expressions are available in Anderson [5] pp.151-152: and for k even (18) We will show that when k is odd, and when k is even, where the generalized hypergeometric function, p F q (a 1 , . . ., a p ; b 1 , . . ., b q ; z), is as defined in Equation (3).Since then, according to Equation (19), when k is odd, one has , is seen to be equal to the expression given in Equation (17).Now, when k is even, according to Equation (20), one has which turns out to be equal to the right-hand side of Equation ( 18) on noting that, as proved earlier, (2i)! .

Numerical Examples
When the series representations of the density function or the moments of R are utilized, the number of terms required to achieve convergence depends on the length of the observation vector, the underlying correlation coefficient and the point at which the density function is evaluated in the former case or the order of the required moment in the latter.In certain instances, even 1000 terms turn out to be insufficient.The proposed closed-form expressions, which for all intents and purposes produce exact numerical results, can be evaluated much more quickly.
Consider for example the case, n = 10 and ρ = −0.97.Table 1 reports the values of the probability density function (PDF) of R, first determined from f (r) as specified by Equation ( 2), truncated to 500 and 1000 terms, and then, from g(r), the exact closed-form representation given in Equation ( 16), for r = −0.99,−0.25, 0.05, 0.25, 0.95.Similarly, when n = 75 and ρ = 0.80, one obtains the numerical results appearing in Table 2.
Table 2. PDF of R as evaluated from f (r) truncated to m terms and g(r).Certain moments of R are included Table 3 for some values of k, n and ρ, along with the computing times associated with the evaluation of the truncated series representations of the moments given in Equations ( 17) and (18) and the closed-form representations specified by Equations ( 19) and (20).We observed that the computing times can be significantly reduced by making use of the closed-form expressions.All the calculations were carried out with the symbolic computing software Mathematica, the code being available from the author upon request.

Table 1 .
PDF of R evaluated from f (r) truncated to m terms and g(r).