Next Article in Journal
Heat Kernel Embeddings, Differential Geometry and Graph Structure
Next Article in Special Issue
On the Fractional Poisson Process and the Discretized Stable Subordinator
Previous Article in Journal
Lindelöf Σ-Spaces and R-Factorizable Paratopological Groups
Previous Article in Special Issue
On Elliptic and Hyperbolic Modular Functions and the Corresponding Gudermann Peeta Functions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Closed-Form Representations of the Density Function and Integer Moments of the Sample Correlation Coefficient

Department of Statistical & Actuarial Sciences, The University of Western Ontario, London, ON N6A 5B7, Canada
Axioms 2015, 4(3), 268-274; https://doi.org/10.3390/axioms4030268
Submission received: 7 May 2015 / Revised: 8 June 2015 / Accepted: 18 June 2015 / Published: 20 July 2015

Abstract

:
This paper provides a simplified representation of the exact density function of R , the sample correlation coefficient. The odd and even moments of R are also obtained in closed forms. Being expressed in terms of generalized hypergeometric functions, the resulting representations are readily computable. Some numerical examples corroborate the validity of the results derived herein.

1. Introduction

Given { ( X i , Y i ) , i = 1 , , n } , a simple random sample of size n from a bivariate normal distribution, the sample correlation coefficient,
R = 1 n i = 1 n X i X ¯ S X Y i Y ¯ S Y
where X ¯ = i = 1 n X i / n , Y ¯ = i = 1 n Y i / n , S X 2 = i = 1 n ( X i X ¯ ) 2 / n and S Y 2 = i = 1 n ( Y i Y ¯ ) 2 / n , is the maximum likelihood estimator of ρ X , Y , Pearson’s product-moment correlation coefficient. Fisher [1] obtained the following series representation of the density function of R :
f R ( r ) = 2 n 3 π ( n 3 ) ! ( 1 ρ 2 ) n 1 2 ( 1 r 2 ) n 4 2 i = 0 Γ 2 n + i 1 2 ( 2 ρ r ) i i !
which converges for 1 < ρ r < 1 .
Closed-form representations of the exact density of R are derived in Section 2. They are given in terms of the generalized hypergeometric function,
p F q ( a 1 , , a p ; b 1 , , b q ; z ) = k = 0 ( a 1 ) k ( a p ) k ( b 1 ) k ( b q ) k z k k !
where, for example, ( a 1 ) k = Γ ( a 1 + k ) / Γ ( a 1 ) . More specifically, it will be shown that the exact density of R can be expressed as
g ( r ) = 2 n 3 π ( n 3 ) ! ( 1 ρ 2 ) n 1 2 ( 1 r 2 ) n 4 2 × Γ 2 n 1 2 2 F 1 n 1 2 , n 1 2 ; 1 2 ; ρ 2 r 2 + 2 ρ r Γ 2 n 2 2 F 1 n 2 , n 2 ; 3 2 ; ρ 2 r 2
for 1 < ρ r < 1 , which simplifies to
g ( r ) = κ ( n , ρ ) ( 1 r 2 ) n 2 2 2 F 1 n 1 , n 1 ; n 1 / 2 ; ( 1 + ρ r ) / 2
where κ ( n , ρ ) = [ ( n 2 ) B 2 n 1 2 , n 2 1 ρ 2 n 1 2 ] / [ π 2 n + 1 B ( n 1 , n ) ] , B a , b = Γ ( a ) Γ ( b ) / Γ ( a + b ) denoting the beta function. For various results on the hypergeometric function 2 F 1 ( a , b ; c , z ) and its main properties, the reader is referred to Olver et al. [2], Chapter 15. Closed-form representations of the odd and even moments of R are provided in Section 3 and some numerical examples are included in Section 4.
Fisher’s Z -transform is a well-known transformation of R whose associated approximate normal distribution is known to present some shortcomings, especially when the sample size is small and | ρ | is large, in which case the distribution of R is markedly skewed. Winterbottom [3] showed that the normal approximation requires large sample sizes to be valid. It is also known that, in the bivariate normal case, the asymptotic variance of Fisher’s Z statistic does not depend on ρ. Furthermore, as pointed out by Hotelling [4], the variance of R changes with the mean. The density and moment expressions derived in this paper remain accurate for any values of ρ and n.

2. The Exact Density R

It should be noted that the series representation of the density function of R given in Equation (2) converges very slowly. It was indeed observed that, in certain instances, more than 1000 terms may be necessary to reach convergence. Closed-form representations of the exact density function of R are derived in this section.
First, we note that the identity,
Γ [ 1 / 2 ] k ! Γ [ 1 / 2 + k ] = 2 2 k ( 2 k ) !
can be established by re-expressing the Legendre duplication formula,
Γ ( 2 k ) = π 1 / 2 2 2 k 1 Γ ( k ) Γ ( k + 1 / 2 )
as
[ 2 k Γ ( 2 k ) ] = ( Γ ( 1 / 2 ) ) 1 2 2 k [ k Γ ( k ) ] Γ ( 1 / 2 + k )
Moreover, since Γ ( 3 / 2 + k ) = ( 1 / 2 + k ) Γ ( 1 / 2 + k ) = ( 1 / 2 ) ( 2 k + 1 ) Γ ( 1 / 2 + k ) and Γ ( 3 / 2 ) = ( 1 / 2 ) Γ ( 1 / 2 ) , it follows from Equation (6) that
Γ ( 3 / 2 ) k ! Γ ( 3 / 2 + k ) = 2 2 k ( 2 k + 1 ) !
In order to prove that the representation of the density function of given in Equation (4) is equivalent to the series representation (2), it suffices to show that
k = 0 ( 2 r ρ ) k k ! Γ 2 [ ( k + n 1 ) / 2 ] = Γ 2 n 2 1 2 2 F 1 n 2 1 2 , n 2 1 2 ; 1 2 ; r 2 ρ 2 + 2 r ρ Γ 2 n 2 2 F 1 n 2 , n 2 ; 3 2 ; r 2 ρ 2
Now, letting k = 2 j + 1 , we establish that when k odd,
2 r ρ j = 0 ( 2 r ρ ) 2 j ( 2 j + 1 ) ! Γ 2 [ ( 2 j + n ) / 2 ] = 2 r ρ Γ 2 n 2 2 F 1 n 2 , n 2 ; 3 2 ; r 2 ρ 2
Note that
2 r ρ j = 0 ( 2 r ρ ) 2 j ( 2 j + 1 ) ! Γ 2 ( j + n / 2 ) = 2 r ρ j = 0 ( 2 r ρ ) 2 j ( 2 j + 1 ) ! Γ 2 ( j + n / 2 ) = 2 r ρ j = 0 ( r ρ ) 2 j Γ ( j + n / 2 ) Γ ( j + n / 2 ) 2 2 j ( 2 j + 1 ) !
However,
2 r ρ Γ 2 n 2 2 F 1 n 2 , n 2 ; 3 2 ; r 2 ρ 2 = 2 r ρ Γ 2 n 2 j = 0 Γ ( n 2 + j ) Γ ( n 2 + j ) Γ ( 3 2 ) Γ ( n 2 ) Γ ( n 2 ) Γ ( 3 2 + j ) ( r 2 ρ 2 ) j j ! = 2 r ρ j = 0 Γ ( n / 2 + j ) Γ ( n / 2 + j ) ( r 2 ρ 2 ) j Γ ( 3 / 2 ) j ! Γ ( 3 / 2 + j )
which, in view of Equation (8), proves the result.
We now show that when k = 2 i ,
i = 0 ( 2 r ρ ) 2 i ( 2 i ) ! Γ 2 ( ( 2 i + n 1 ) / 2 ) = 2 F 1 n 2 1 2 , n 2 1 2 ; 1 2 ; r 2 ρ 2 Γ 2 n 2 1 2
First, note that
2 F 1 n 2 1 2 , n 2 1 2 ; 1 2 ; r 2 ρ 2 Γ 2 n 2 1 2 = Γ 2 n 2 1 2 i = 0 Γ ( n 2 1 2 + i ) Γ ( n 2 1 2 + i ) Γ ( 1 2 ) Γ ( n 2 1 2 ) Γ ( n 2 1 2 ) Γ ( 1 2 + i ) ( r 2 ρ 2 ) i i !
The result is established by applying identity (7) wherein k is replaced by k 1 . Thus, one has the following closed-form representation of the exact density function of R :
g 1 ( r ) = 1 π ( n 3 ) ! 2 n 3 ( 1 r 2 ) n 4 2 ( 1 ρ 2 ) n 1 2 × Γ 2 n 2 1 2 2 F 1 n 2 1 2 , n 2 1 2 ; 1 2 ; ρ 2 r 2 + 2 ρ r Γ 2 n 2 2 F 1 n 2 , n 2 ; 3 2 ; ρ 2 r 2
A simplified representation of this expression can be obtained by making use of the following identity listed under “Quadratic transformations with fixed a , b , z ” on the Wolfram website, http://functions.wolfram.com/HypergeometricFunctions/Hypergeometric2F1/17/02/10/ :
2 F 1 a , b ; a + b + 1 2 ; z = π Γ ( a + b + 1 2 ) Γ ( a + 1 2 ) Γ ( b + 1 2 ) 2 F 1 a 2 , b 2 ; 1 2 ; ( 2 z 1 ) 2 + 2 π ( 2 z 1 ) Γ ( a + b + 1 2 ) Γ ( a 2 ) Γ ( b 2 ) 2 F 1 a + 1 2 , b + 1 2 ; 3 2 ; ( 2 z 1 ) 2
which, on making the substitutions, a n 1 , b n 1 and z ( 1 + ρ r ) / 2 , becomes
2 F 1 n 1 , n 1 ; n 1 2 ; 1 + ρ r 2 = π Γ ( n 1 2 ) Γ ( n 2 ) Γ ( n 2 ) 2 F 1 n 1 2 , n 1 2 ; 1 2 ; ρ 2 r 2 + 2 r ρ π Γ ( n 1 2 ) Γ ( n 1 2 ) Γ ( n 1 2 ) 2 F 1 n 2 , n 2 ; 3 2 ; ρ 2 r 2
Multiplying both sides by Γ 2 n 1 2 Γ 2 n 2 / Γ n 1 2 π then yields
Γ 2 n 1 2 Γ n 2 2 Γ n 1 2 π 2 F 1 n 1 , n 1 ; n 1 2 ; 1 + ρ r 2 = Γ 2 n 1 2 2 F 1 n 1 2 , n 1 2 ; 1 2 ; ρ 2 r 2 + 2 ρ r Γ 2 n 2 2 F 1 n 2 , n 2 ; 3 2 ; ρ 2 r 2
Hence, the following form of the exact density function of R :
2 n 3 Γ 2 n 1 2 Γ 2 n 2 1 ρ 2 n 1 2 π 3 / 2 Γ n 1 2 ( n 3 ) ! 1 r 2 n 4 2 2 F 1 n 1 , n 1 ; n 1 2 ; 1 2 ( 1 + r ρ ) = 2 n 3 B 2 n 1 2 , n 2 Γ n 1 2 1 ρ 2 n 1 2 π 3 / 2 ( n 3 ) ! 1 r 2 n 4 2 2 F 1 n 1 , n 1 ; n 1 2 ; 1 2 ( 1 + ρ r )
which, on letting k = n 1 in Equation (6), gives
B 2 n 1 2 , n 2 ( 2 n 2 ) ! 1 ρ 2 n 1 2 2 n + 1 π ( n 3 ) ! ( n 1 ) ! 1 r 2 n 4 2 2 F 1 n 1 , n 1 ; n 1 2 ; 1 2 ( 1 + ρ r )
Finally, the following representation of the density function of R is obtained on writing ( 2 n 2 ) ! / [ ( n 3 ) ! ( n 1 ) ! ] as ( n 2 ) Γ ( 2 n 1 ) / [ Γ ( ( n 1 ) Γ ( n ) ] = ( n 2 ) / B ( n 1 , n ) :
g ( r ) = ( n 2 ) B 2 n 1 2 , n 2 1 ρ 2 n 1 2 2 n + 1 B n 1 , n π ( 1 r 2 ) n 2 2 2 F 1 n 1 , n 1 ; n 1 2 ; 1 + ρ r 2
Incidentally, this expression is more compact than that proposed by Hotelling [4].

3. Closed Forms for the Moments of R

It is shown in this section that the moments of R can also be expressed in closed forms. The following moment expressions are available in Anderson [5] pp. 151–152:
E ( R k ) = ( 1 ρ 2 ) n 1 2 π Γ ( n 1 2 ) i = 0 ( 2 ρ ) 2 i + 1 ( 2 i + 1 ) ! Γ ( 3 2 + k 1 2 + i ) Γ 2 ( n 2 + i ) Γ ( n + 1 2 + k 1 2 + i ) for k odd
and
E ( R k ) = ( 1 ρ 2 ) n 1 2 π Γ ( n 1 2 ) i = 0 ( 2 ρ ) 2 i ( 2 i ) ! Γ ( 1 2 + k 2 + i ) Γ 2 ( n 2 1 2 + i ) Γ ( n 1 2 + k 2 + i ) for k even
We will show that when k is odd,
E ( R k ) = 2 ρ ( 1 ρ 2 ) n 1 2 Γ ( k 2 + 1 ) Γ 2 ( n 2 ) π Γ ( n 1 2 ) Γ ( k + n 2 ) 3 F 2 k 2 + 1 , n 2 , n 2 ; 3 2 , k 2 + n 2 ; ρ 2
and when k is even,
E ( R k ) = ( 1 ρ 2 ) n 1 2 Γ ( k + 1 2 ) Γ ( n 1 2 ) π Γ ( k + n 1 2 ) 3 F 2 k 2 + 1 2 , n 2 1 2 , n 2 1 2 ; 1 2 , k 2 + n 2 1 2 ; ρ 2
where the generalized hypergeometric function, p F q ( a 1 , , a p ; b 1 , , b q ; z ) , is as defined in Equation (3).
Since
3 F 2 ( n 1 , n 2 , n 3 ; d 1 , d 2 ; v ) = k = 0 Γ ( n 1 + k ) Γ ( n 2 + k ) Γ ( n 3 + k ) Γ ( d 1 ) Γ ( d 2 ) v k Γ ( n 1 ) Γ ( n 2 ) Γ ( n 3 ) Γ ( d 1 + k ) Γ ( d 2 + k ) k !
then, according to Equation (19), when k is odd, one has
E ( R k ) = ( 1 ρ 2 ) n 1 2 π Γ ( n 1 2 ) 2 ρ Γ ( k 2 + 1 ) Γ 2 ( n 2 ) Γ ( k + n 2 ) i = 0 Γ ( k 2 + 1 + i ) Γ 2 ( n 2 + i ) Γ ( 3 2 ) Γ ( k 2 + n 2 ) ρ 2 i Γ ( k 2 + 1 ) Γ 2 ( n 2 ) Γ ( 3 2 + i ) Γ ( k 2 + n 2 + i ) i ! = ( 1 ρ 2 ) n 1 2 π Γ ( n 1 2 ) i = 0 2 Γ ( k 2 + 1 + i ) Γ 2 ( n 2 + i ) Γ ( 3 2 ) ρ 2 i + 1 Γ ( k 2 + n 2 + i ) Γ ( 3 2 + i ) i !
which, in light of Equation (8), that is, Γ ( 3 / 2 ) Γ ( 3 / 2 + i ) i ! = 2 2 i ( 2 i + 1 ) ! , is seen to be equal to the expression given in Equation (17).
Now, when k is even, according to Equation (20), one has
E ( R k ) = ( 1 ρ 2 ) n 1 2 π Γ ( k + 1 2 ) Γ ( n 1 2 ) Γ ( 1 2 ( k + n 1 ) ) i = 0 Γ ( k 2 + 1 2 + i ) Γ 2 ( n 2 1 2 + i ) Γ ( 1 2 ) Γ ( k 2 + n 2 1 2 ) ρ 2 i Γ ( k 2 + 1 2 ) Γ 2 ( n 2 1 2 ) Γ ( 1 2 + i ) Γ ( k 2 + n 2 1 2 + i ) i ! = ( 1 ρ 2 ) n 1 2 π i = 0 Γ ( k 2 + 1 2 + i ) Γ 2 ( n 2 1 2 + i ) Γ ( 1 2 ) ρ 2 i Γ ( n 2 1 2 ) Γ ( k 2 + n 2 1 2 + i ) Γ ( 1 2 + i ) i !
which turns out to be equal to the right-hand side of Equation (18) on noting that, as proved earlier, Γ ( 1 2 ) Γ ( 1 2 + i ) i ! = 2 2 i ( 2 i ) ! .

4. Numerical Examples

When the series representations of the density function or the moments of R are utilized, the number of terms required to achieve convergence depends on the length of the observation vector, the underlying correlation coefficient and the point at which the density function is evaluated in the former case or the order of the required moment in the latter. In certain instances, even 1000 terms turn out to be insufficient. The proposed closed-form expressions, which for all intents and purposes produce exact numerical results, can be evaluated much more quickly.
Consider for example the case, n = 10 and ρ = 0 . 97 . Table 1 reports the values of the probability density function (PDF) of R , first determined from f ( r ) as specified by Equation (2), truncated to 500 and 1000 terms, and then, from g ( r ) , the exact closed-form representation given in Equation (16), for r = 0 . 99 , 0 . 25 , 0 . 05 , 0 . 25 , 0 . 95 .
Table 1. PDF of R as evaluated from f ( r ) truncated to m terms and g ( r ) .
Table 1. PDF of R as evaluated from f ( r ) truncated to m terms and g ( r ) .
r f ( r ) [ m = 500 ] f ( r ) [ m = 1000 ] g ( r ) ( C l o s e d f o r m )
0 . 99 21.083921.104321.1043
0 . 25 0.00002843040.00002843040.0000284304
0.05 2 . 15111 × 10 6 2 . 15111 × 10 6 2 . 15111 × 10 6
0.25 4 . 20668 × 10 7 4 . 20668 × 10 7 4 . 20668 × 10 7
0.95 4 . 61344 × 10 11 1 . 1523 × 10 11 1 . 15232 × 10 11
Similarly, when n = 75 and ρ = 0 . 80 , one obtains the numerical results appearing in Table 2.
Table 2. PDF of R as evaluated from f ( r ) truncated to m terms and g ( r ) .
Table 2. PDF of R as evaluated from f ( r ) truncated to m terms and g ( r ) .
r f ( r ) [ m = 500 ] f ( r ) [ m = 1000 ] g ( r ) ( C l o s e d f o r m )
0 . 90 1 . 08277 × 10 18 1 . 07281 × 10 18 1 . 57819 × 10 59
0 . 60 4 . 50675 × 10 19 4 . 50675 × 10 19 5 . 23693 × 10 36
0.600.01281670.01281670.0128167
0.95 6 . 01144 × 10 7 6 . 01144 × 10 7 6 . 01144 × 10 7
Certain moments of R are included Table 3 for some values of k, n and ρ, along with the computing times associated with the evaluation of the truncated series representations of the moments given in Equations (17) and (18) and the closed-form representations specified by Equations (19) and (20). We observed that the computing times can be significantly reduced by making use of the closed-form expressions. All the calculations were carried out with the symbolic computing software Mathematica, the code being available from the author upon request.
Table 3. Certain moments of R and associated computing times in seconds.
Table 3. Certain moments of R and associated computing times in seconds.
Formula ( n , ρ , k ) k th momentTiming
(17) 1000 terms(800, 0.75, 7)0.1344210.468
(19) closed-form(800, 0.75, 7)0.1344210.032
(18) 1000 terms ( 200 , 0 . 91 , 12 ) 0.3246310.577
(20) closed-form ( 200 , 0 . 91 , 12 ) 0.3246310.047
(17) 1000 terms(8, 0.255, 23)0.0017520.327
(19) closed-form(8, 0.255, 23)0.001752 5 . 72459 × 10 16
(18) 1000 terms(60, 0.051, 36) 1 . 16476 × 10 13 0.514
(20) closed-form(60, 0.051, 36) 1 . 16476 × 10 13 6 . 67869 × 10 16

Acknowledgments

The financial support of the Natural Sciences and Engineering Research Council of Canada is gratefully acknowledged. Thanks are also due to two referees for their valuable comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Fisher, R.A. Distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 1915, 10, 507–521. [Google Scholar] [CrossRef]
  2. Olver, F.W.J.; Lozier, D.W.; Boisvert, R.; Clark, C.W. NIST Handbook on Mathematical Functions; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  3. Winterbottom, A. A note on the derivation of Fisher’s transformation of the correlation coefficient. Am. Stat. 1979, 33, 142–143. [Google Scholar]
  4. Hotelling, H. New light on the correlation coefficient and its transforms. J. R. Stat. Soc. Ser. B 1953, 15, 193–232. [Google Scholar]
  5. Anderson, T.W. An Introduction to Multivariate Statistical Analysis; Wiley: New York, NY, USA, 1984. [Google Scholar]

Share and Cite

MDPI and ACS Style

Provost, S.B. Closed-Form Representations of the Density Function and Integer Moments of the Sample Correlation Coefficient. Axioms 2015, 4, 268-274. https://doi.org/10.3390/axioms4030268

AMA Style

Provost SB. Closed-Form Representations of the Density Function and Integer Moments of the Sample Correlation Coefficient. Axioms. 2015; 4(3):268-274. https://doi.org/10.3390/axioms4030268

Chicago/Turabian Style

Provost, Serge B. 2015. "Closed-Form Representations of the Density Function and Integer Moments of the Sample Correlation Coefficient" Axioms 4, no. 3: 268-274. https://doi.org/10.3390/axioms4030268

Article Metrics

Back to TopTop