Fast Computation of the Non-Central Chi Square PDF Outside the HDR Under a Requisite Precision Constraint

Computation of the non-central chi square probability density function is encountered in diverse fields of applied statistics and engineering. The distribution is commonly computed as a Poisson mixture of central chi square densities, where the terms of the sum are computed starting with the integer nearest the non-centrality parameter. However, for computation of the values in either tail region these terms are not the most significant and starting with them results in an increased computational load without a corresponding increase in accuracy. The most significant terms are shown to be a function of both the non-centrality parameter, the degree of freedom and the point of evaluation. A computationally simple approximate solution to the location of the most significant terms as well as the exact solution based on a Newton–Raphson iteration is presented. A quadratic approximation of the interval of summation is also developed in order to meet a requisite number of significant digits of accuracy. Computationally efficient recursions are used over these improved intervals. The method provides a means of computing the non-central chi square probability density function to a requisite accuracy as a Poisson mixture over all domains of interest.


Introduction
Efficiently evaluating the non-central chi square probability density function (PDF) is of practical importance to a number of problems in applied statistics [1].For instance, the ratio of two densities f (x)/p(x), where p(x) is a non-central chi square density, may be necessary in an importance sampling scheme or for hypothesis testing.The non-central chi square density [2], where λ is the non-centrality parameter and ν the degree of freedom, arises in the general case where x = e Σ −1 e and e ∼ N (µ, Σ), e ∈ R ν , and Σ positive definite.It follows that x ∼ χ 2 ν (λ) where λ = µ Σ −1 µ.The notation e ∼ N (µ, Σ) implies that e has Gaussian distribution [3], N (µ, Σ) = (2π) −ν/2 |Σ| −1/2 exp(− 1  2 (e − µ) Σ −1 (e − µ)) and x ∼ χ 2 ν (λ) denotes that x is distributed as a non-central chi square with ν degrees of freedom and non-centrality parameter λ.For a non-central chi square variate E[x] = λ + ν, and V ar[x] = 4λ + 2ν.The density at x can be represented [2] by where I ν (x) is the modified Bessel function of order ν.However this representation is numerically problematic for large x or large λ.The asymptotic result (x − (λ + ν)) / √ 2ν + 4λ ∼ N (0, 1) as λ or ν → ∞ can be useful, but is not a universal substitute for direct computation of p(x; ν, λ), even for relatively large λ.For this reason the discrete mixture representation where it is recognized that the expectation is over the Poisson density [4] is the central chi square probability density function (PDF) [3] with ν degrees of freedom.The non-central chi square density at the point x is therefore the average of an infinite set of central chi square densities evaluated at x.The goal of this article is to present a method to compute the value of the distribution function of a non-central chi square variate using the mixture representation and to do so with minimal computational effort to a specified accuracy in all domains of interest, including both the high density region (HDR) and the tail regions.In doing so, we will illuminate the often employed standard method and discuss the computational weaknesses that method has in the tails of the distribution.The cumulative density function (CDF) x 0 p(α; ν, λ)dα of the non central chi square has been addressed by [5] and the computational issues associated with the domains of evaluation have been addressed by a treatment dependent on the parameters, {x, ν, λ} [6].

Computation of p(x; ν, λ)
With the goal to compute p(x; ν, λ) to a specified accuracy for arbitrary x, λ and ν in a computationally efficient manner, start with the representation of the non-central chi-square density as a mixture of central chi-square densities: The sum is computed efficiently by ordering the terms to be summed according to their size, with the larger terms first and then for each of the terms to be summed exploiting the following recursions: thus ensuring that each term is computed in a computational efficient manner.It is necessary to determine which terms maximally contribute to the sum and including these in the summation in order to meet the requisite accuracy.This domain of summation must provide for maximal computational efficiency for a given accuracy, or conversely minimal error for a pre-specified computational budget.
A popular approach, which is the basis of most algorithms in use, is to start the recursion at the mode of the Poisson density and proceed to sum in both directions until the relative accuracy is obtained.This approach is termed Algorithm 1.
Algorithm 1 This approach is nearly optimal for x in the high probability density region, however it is quite inefficient in the tail regions of the density.That is the mode, λ/2 is quite distant from the maximally contributing terms of the summation in regions other than the high density region.For this reason, starting at λ/2 , as in Algorithm 1, wrongly focuses computation on terms that contribute little information to the numerical result.The optimal solution would start at the location of the peak of f (x, n; λ, ν).Determine this starting point by noting that as a function of Define and employ the series expansion for ψ(n + 1), the digamma function where B k the k-th Bernoulli number, to yield Define and an approximate solution for the integer n ≥ 0 is implying , the expected value of the Poisson mixture weights.The advantage of using Equation ( 11) is that it is both relatively simple, requiring only a square root operation on integers, and is accurate at locating the largest terms of the sum in both the high density region and the tail regions.As will be shown for domains outside of the high density region n * is computationally efficiency relative to starting at λ/2 .
To determine argmax n f (x, n|ν, λ) exactly, use a few Newton-Raphson iterations starting from Equation (11) as where Since the solution is needed only to the nearest integer, a single iteration of Equation ( 12) is sufficient.
Algorithm 2 The result is computationally more efficient than Algorithm 1 for x outside the high probability region for a pre-specified accuracy.It does, however, require the initial computation of n * via Equations ( 11) or (12) as well as retaining the need for computing the comparisons of Step 3 for each iteration.Employing the more accurate starting point of Equation ( 12) requires a Newton-Raphson iteration and thus the evaluation of the information scale, I(n).Since integer arithmetic is computationally insignificant relative to floating point arithmetic, the approach has merit for x outside the high probability region.
There is a further advantage to computing the information scale.The information scale I(n) gives an approximate measure of the number of terms required.This brings us to the last method presented: eliminate Step 3 from Algorithm 2 by simply approximating the domain of summation with a Laplace approximation [7] to f (x, n; λ, ν).The information scale is approximated with These derivatives are approximations based on the truncated expansion of ln(Γ(n)) , where O(n p ) is polynomial in n with maximal degree p.The decimal place contribution of the term f (x, n; λ, ν) is with a lower bound We can attain accuracy to B decimal places by including in the sum terms for which Therefore let and include in the summation only those terms within the set D * (B).The advantage of computing the domain of summation a priori is that it obviates the need to compute the relative errors and the computation can be performed with a simple for loop.The terms can be summed from least to greatest, with terms near n = n * + D * (B) summed first in order to minimize accumulation errors.The disadvantage is that the set is larger than necessary, due to the approximation of Equation (15) as well as the Laplace approximation.
where it is understood that the comparison of step 3 need not be made, as a for loop can perform this function implicitly.

Results
The three algorithms are compared to illustrate the importance of initialization of the algorithm by proper determination of n * (x, ν, λ) in the computation of the non-central chi square PDF in the tail regions.First, Figure 1 depicts the normalized joint densities Figure 2 likewise depicts the normalized densities r(x, n|λ, ν) for a degree of freedom parameter value of ν = 20 and a range of non-centrality parameters λ =1, 8, 20 and 100.The increase in degree of freedom parameter relative to Figure 1 is shown to not significantly alter n * (x, ν, λ).The upper and lower bounds of Algorithm 3 for B = 8 are shown.It is noted that for λ = 100 depicted in the lower right of Figures 1 and 2, the difference between n * and λ/2 in the lower tail of the distribution is also quite stark.
In order to demonstrate the performance of each of the three algorithms, they are each displayed relative to the value of the PDF computed from using the Algorithm 3 and summing least to greatest to obviate any accumulation errors.Define the relative errors of the 3 algorithms as follows    Figure 3 displays the relative errors for the three algorithms, with = 10 −4 .To the left and center are Algorithms 1 and 2 respectively, and the figure demonstrates the close proximity of each to a relative error of 10 −4 .To the right is Algorithm 3, and it can be seen that the use of the bounding approximation of Equation (15) implies that more terms are being included, such that an extra decimal place is attained across all non-centrality parameters shown.

Conclusions
The Poisson mixture representation of the non-central chi-square PDF is a useful and commonly employed means of computing the PDF at various domain values.The summation of terms is often initialized at the integer closest to half of the non-centrality parameter, the mode of the Poisson mixture weights λ/2 , and is efficient in the high probability density region (HDR).Outside of the HDR, the method is computationally inefficient, since the dominant terms in the sum are distant from the mode λ/2 .This shortcoming is addressed here by providing the maximally contributing terms in the sum regardless of the domain of evaluation.We do so in order to meet a requisite accuracy and computational load.The approach provides a simple means to retain the mixture representation for computation of the PDF outside of the HDR in a manner that is computationally reasonable.We present two methods to accomplish this task outside the HDR that provide a computation versus accuracy trade-off that is comparable within the HDR.

for ν = 4
and for diverse non-centrality parameters of λ =1, 8, 20 and 100.The initialization point, n * for Algorithm 2 and Algorithm 3 are shown, as well as the conventional starting point λ/2 of Algorithm 1.The difference between the actual maximum and λ/2 is quite stark outside the HDR.Also shown are the Laplace approximate upper (Ub) and lower bounds (Lb) associated with B = 4 used for Algorithm 3.

Figure 1 .
Figure 1.The magnitude of the normalized terms r(x, n|λ, ν) for ν = 4.The peak is shown as a function of x for various non-centrality parameters λ.The difference between λ/2 and the actual n * = argmax n f (x, n|λ, ν) in the tail regions is apparent.The upper (Ub) and lower (Lb) bounds of Algorithm 3 for the interval of summation associated with B = 4 decimal place accuracy.

Figure 2 .
Figure 2. The magnitude of the normalized terms r(x, n|λ, ν) for ν = 20.The peak is shown as a function of x for various non-centrality parameters λ.The difference between λ/2 and the actual n * = argmax n r(x, n|λ, ν) in the tail regions is apparent.The upper (Ub) and lower (Lb) bounds of Algorithm 3 for the interval of summation associated with B = 8 decimal place accuracy.

Figure 3 .
Figure 3. Left to right, the relative errors with B = 4, for the three algorithms.Left, Algorithm 1 iterating starting from λ/2 .Center, Algorithm 2 iterating from n * .To the right, Algorithm 3 iterating from n * and with the index set computed a priori.

Figure 4
Figure4provides the number of terms necessary to attain the performance shown in Figure3for each of the algorithms.It is noted that only for λ = 100 in Algorithm 3 are the number of terms in excess of Algorithm 1 within the HDR .It is noteworthy that Algorithm 3 outperforms Algorithm 1 by over 2 × 40 floating point operations per evaluation in the tail region, since there are two additional floating point operation per iteration in Algorithm 1 that are not necessary in Algorithm 2.

Figure 4 .
Figure 4.The number of terms necessary for the three algorithms to attain the accuracy shown in Figure 3 for ν = 4 and for various λ of 1, 8, 20 and 100.Algorithm 1, blue dashed.Algorithm 2, red dashed.Algorithm 3, black solid.