You are currently viewing a new version of our website. To view the old version click .
Engineering Proceedings
  • Proceeding Paper
  • Open Access

10 April 2025

Calculating Percentiles of T-Distribution Using Gaussian Integration Method †

Department of Vehicle Engineering, National Formosa University, Yunlin 632, Taiwan
Presented at the 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering, Yunlin, Taiwan, 15–17 November 2024.
This article belongs to the Proceedings 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering

Abstract

Statistical inference is used to estimate population parameters based on sample information and to quantify the sampling error based on the probability narrative. The population mean is inferred by its sample mean, but when using sample variance, the population variance is needed. In the quantitative analysis of the sampling error, the t-distribution is used. To determine the percentiles of the t-distribution, the cumulative probability density function is necessary. However, the analytic expression does not exist for the cumulative probability density function of the t-distribution. Its values are obtained using numerical integration. However, the percentiles of the t-distribution are not listed for degrees of freedom over 30, while only listed for every 10 data points in probability theory or mathematical statistics. This is inconvenient for research. Therefore, the cumulative probability density function of t-distribution was calculated using the Gaussian integration method in this study. The results show that the percentiles of the t-distribution are accurately estimated using the algorithm developed in this study.

1. Introduction

The census is conducted to measure or examine the statistics of a population. Sampling is performed to measure or examine members of a population. As the census is costly and time-consuming, sampling errors are critical, and artificial errors occur due to personnel fatigue. Therefore, the census is not more efficient than sampling.
The t-distribution is used in the quantitative analysis of sampling errors. To determine the percentiles of the t-distribution, its cumulative probability density function is used [1,2,3]. The function is derived by numerical integration. The Gaussian integration method is adopted in this study to integrate the probability density function of the t-distribution [4,5]. The integrand is fitted through the power series by the Gaussian integration method. If the step size is small enough and does not lead to the truncating error, accurate numerical integration is anticipated. The results of this study show that a high accuracy of percentiles of the t-distribution can be obtained using the algorithm developed in this article.

2. Gaussian Integration Method

Using the Gaussian integration method, +1 and −1 are taken as the upper and the lower limit, respectively. The integration result is obtained by summing specific integrand function values by multiplying the corresponding weights. These specific points are called Gaussian points [4,5].
I = 1 + 1 f λ d λ = i = 1 m w i f λ i
For example, the two-point formula (m = 2 in (1)) is analyzed as follows.
When m equals 2 in (1), there are two weighting coefficients w1, w2, and two sampling points, λ1 and λ2, are obtained in advance. There are four unknowns and let
f ( λ ) = a 0   +   a 1 λ   +   a 2 λ 2   +   a 3 λ 3
When a i , i = 0,1 , . . . , 3 are arbitrary, the following condition is satisfied:
1 + 1 ( a 0 + a 1 λ + a 2 λ 2 + a 3 λ 3 ) d λ = w 1 f ( λ 1 ) + w 2 f ( λ 2 )
This implies the following:
a 0 0 , a 1 = a 2 = a 3 = 0 in (3) must be satisfied. Therefore,
w 1 + w 2 = 2
Let a 1 0 , a 0 = a 2 = a 3 = 0 in (3), then
w 1 λ 1 + w 2 λ 2 = 0
When a 2 0 , a 0 = a 1 = a 3 = 0 in (3), we have
1 + 1 a 2 λ 2 d λ = w 1 a 2 λ 1 2 + w 2 a 2 λ 2 2 1 + 1 a 1 λ d λ = w 1 a 1 λ 1 + w 2 a 1 λ 2 w 1 λ 1 2 + w 2 λ 2 2 = 2 3
When a 3 0 , a 0 = a 1 = a 2 = 0 in (3),
1 + 1 a 3 λ 3 d λ = w 1 a 3 λ 1 3 + w 2 a 3 λ 2 3 w 1 λ 1 3 + w 2 λ 2 3 = 0
Using (4) to (7),
w 1 = w 2 = 1.0 ,   λ 1 = 1 3 ,   and   λ 2 = 1 3
Evaluation accuracy increases along with the increasing number of Gaussian points. The sampling points with sampling numbers 1 through 6 and their corresponding weighting coefficients, respectively, are listed in Table 1 [4,5].
Table 1. Sampling points with sampling numbers 1 through 6 and their corresponding weighting coefficients, respectively.
If the upper and lower limits are not exactly equal to +1 and −1,
I = a b f ( x ) d x
Equation (9) can be normalized as
x = h 0 + h 1 λ
Then,
a = h 0 + h 1 ( 1 )
b = h 0 + h 1 ( 1 )
By solving (11) and (12),
h 0 = b + a 2
h 1 = b a 2
x = b + a 2 + b a 2 λ
d x = b a 2 d λ
Substituting (15) and (16) into (9), the following is obtained:
I = ( b a 2 ) 1 + 1 f ( b + a 2 + b a 2 λ ) d λ          = ( b a 2 ) 1 + 1 F ( λ ) d λ = ( b a 2 ) i = 1 m w i F ( λ i )

3. Evaluation Percentiles of T-Distribution

The t-distribution is the abbreviation of Student′s t-distribution. The t-distribution was proposed by William Sealy Gosset. The t-distribution is used to infer the population mean with a small sample size and a normal distribution without its population variance. The quantitative analysis of sampling error always involves the t-distribution.
The probability density function of the t-distribution with degrees of freedom n is calculated as
T n ( x ) = Γ ( ( n + 1 ) / 2 ) n π × Γ ( n / 2 ) × ( 1 + ( x 2 / n ) ) ( n + 1 ) / 2
where n is the degree of freedom and
Γ ( υ ) = 0 t υ 1 × e t d t
is the Gamma function.
For the degree of freedom n in (18) to be a natural number,
Γ ( n ) = ( n 1 ) !
and
Γ ( n + 1 2 ) = ( 2 n ) ! × π / [ 4 n × ( n ! ) ]
To evaluate the 100 ( 1 α ) percentile in Figure 1, t α , n must be evaluated. The probability P ( T n > t α , n ) of the random variable T n with a t-distribution greater than t α , n is α . To find t α , n , (21) is used.
P ( T n > t α , n ) = α
or
t α , n Γ ( ( n + 1 ) / 2 ) n π × Γ ( n / 2 ) × ( 1 + ( x 2 / n ) ) ( n + 1 ) / 2 d x = α
where α is the function of t α , n in (22). The inverse function does not exist. That is found in Newton’s iteration method. Equation (22) is rewritten as
g ( t α , n ) = t α , n Γ ( ( n + 1 ) / 2 ) n π × Γ ( n / 2 ) × ( 1 + ( x 2 / n ) ) ( n + 1 ) / 2 d x α
Figure 1. The 100(1 − α) percentile tα,n corresponding to degrees of freedom n.
Then, to find t α , n ,
g ( t α , n ) = 0
To give a guess value t α , n , 0 ,
t α , n , n e w = t α , n , o l d g ( t α , n , o l d ) / g ( t α , n , o l d )
and the iteration is conducted. g ( t α , n , o l d ) in (26) is obtained by the following numerical differentiation.
g ( t α , n , o l d ) = L i m Δ t α , n 0 g ( t α , n , o l d + Δ t α , n ) g ( t α , n , o l d ) Δ t α , n
Because the probability density function of the t-distribution is symmetric to x = 0, the five-point Gaussian integration method is adopted, and the interval (0, t α , n ) is divided into 1 0 6 equally spaced subintervals. With a guess value of 1.0, Δ t α , n in (27) is set to 0 5 , and the iteration is stopped when the magnitude of g ( t α , n , o l d ) / g ( t α , n , o l d ) becomes smaller than 1 0 5 . The 100(1 − α ) percentile t α , n corresponds to the degrees of freedom n of 1 through 120, respectively, and α equal to 0.200, 0.100, 0.050, 0.025, 0.010, 0.005, 0.001, and 0.0005, respectively (Table 2).
Table 2. The 100(1 − α ) percentile t α , n corresponding to the degrees of freedom n equal to 1 through 120, respectively, and α equal to 0.200, 0.100, 0.050, 0.025, 0.010, 0.005, 0.001, and 0.0005, respectively.

4. Conclusions

The derivation process of the Gaussian numerical integration using (1)–(17) shows that when m = 2, two weight coefficients, w1 and w2, need to be determined at two sampling points, λ1 and λ2, resulting in four unknowns. The integrand is fitted as a cubic polynomial. Therefore, the form of the integrand in (1) is
f ( λ ) = i = 0 2 m 1 a i λ i
where m is the number of sampling points in the Gaussian integral. Using (1) for integration, the theoretical error approaches zero even if the integration interval is not subdivided. However, since the probability density function of the t-distribution in (19) does not have the finite polynomial form as in (28), it is necessary to appropriately partition the integration interval before applying Gaussian integration with m sampling points and summing the results.
The probability density function T n ( x ) of the t-distribution with degrees of freedom n (19) indicates that the percentile 100 ( 1 α )   t α , n for n = 1 is particularly difficult to calculate accurately, especially the 100 ( 1 0.0005 ) percentile t 0.0005 , 1 = 636.619
The value is calculated to verify the correctness of the 100(1 − α) percentiles t α , n of the t-distribution listed in Table 2. The Newton–Raphson method is used to calculate the percentiles of the t-distribution (26) and (27), with an initial guess of 1.0 and an increment Δ t α , n equal to 1 0 5 in (27). Iteration stops when the adjustment g ( t α , n , o l d ) / g ( t α , n , o l d ) in (26) is smaller than a specified threshold 1 0 5 .
For n = 1, P ( T 1 > t 0.0005 , 1 ) = α = 0.0005 . The Gaussian integration method is used with m = 5 sampling points, dividing the interval from 0 to the specified limit t 0.0005 , 1 into 1 0 6 equal subintervals, and the result is t 0.0005 , 1 = 636.619249 after 15 iterations. The interval is partitioned from 0 to t 0.0005 , 1 into varying numbers of equal subintervals for NSECT = 100, 175, 250, 500, 750, 1000, 2500, 5000, 1 0 4 , 1 0 5 , 1 0 6 , and 1 0 7 , and the results are listed in Table 3. Partitioning the interval into 1000 to 2500 subintervals yields the t-distribution’s 100(1 – 0.0005) percentile approaching 636.619249. Moreover, as the value of NSECT increases, the calculated result remains 636.619249, indicating that there is no issue with rounding errors in the computation process.
Table 3. The 100(1 – 0.0005) percentiles t 0.0005 , 1 by partitioning the interval from 0 to t 0.0005 , 1 into varying numbers of equal subintervals for NSECT = 100, 175, 250, 500, 750, 1000, 2500, 5000, 1 0 4 , 1 0 5 , 1 0 6 and 1 0 7 .
To verify the accuracy of t 0.0005 , 1 = 636.619249, t 0.0005 , 1 = 636.619249 in (23) for validation. The interval is partitioned from 0 to 636.619249 into various numbers of equal subintervals. NSECT = 100, 175, 250, 500, 750, 1000, 2500, 5000, 1 0 4 , 1 0 5 , 1 0 6 and 1 0 7 , and the probability is P ( T 1 > 636.619249 ) that the random variable of the t-distribution exceeds 636.619249 (Table 4). If the number of subintervals NSECT exceeds 750, the probability P ( T 1 > 636.619249 ) is accurately calculated as 0.000500.
Table 4. The probability P ( T 1 > 636.619249 ) that a random variable of the t-distribution exceeds the value 636.619249 by partitioning the interval from 0 to 636.619249 into various numbers of equal subintervals, NSECT = 100, 175, 250, 500, 750, 1000, 2500, 5000, 10 4 , 1 0 5 , 1 0 6 , and 1 0 7 .
The negative result for NSECT = 100 arises because 100 is too small, increasing error and yielding a probability P ( 636.619249 > T 1 > 0 ) greater than 0.5. Thus, it is reasonable to conclude that using increments Δ t α , n as 1 0 5 in (27), stopping iteration when the adjustment g ( t α , n , o l d ) / g ( t α , n , o l d ) in (26) is less than a specified threshold 1 0 5 , and selecting an appropriate number 10 6 of subintervals is valid. Each of the 100(1 − α) percentiles of the t-distribution in Table 2 is calculated in five seconds. Furthermore, rounding the values in Table 2 to three decimal places yields results that match exactly with those from highly authoritative publications [6].

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Montgomery, D.C. Design and Analysis of Experiments; John Wiley & Sons: New York, NY, USA, 2001. [Google Scholar]
  2. Lee, J.B.; Max, E. Introduction to Probability and Mathematical Statistics; Duxbury Press: Belmont, CA, USA, 1992. [Google Scholar]
  3. Jay, L.D. Probability and Statistics for Engineering and the Sciences; Brooks/Cole: Pacific Grove, CA, USA, 2012. [Google Scholar]
  4. Zienkiewicz, O.C. The Finite Element Method; McGraw-Hill Book Co.: Maidenhead, UK, 1977. [Google Scholar]
  5. Bathe, K.J. Finite Element Procedures in Engineering Analysis; Prentice-Hall: Upper Saddle River, NJ, USA, 1982. [Google Scholar]
  6. Pearson, E.S.; Hartley, H.O. Biometrika Tables for Statisticians; Cambridge University Press: Cambridge, UK, 1966. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.