Next Article in Journal
Recent Extensions to the Cochran–Mantel–Haenszel Tests
Previous Article in Journal
A New Burr XII-Weibull-Logarithmic Distribution for Survival and Lifetime Data Analysis: Model, Theory and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smooth Tests of Fit for the Lindley Distribution

1
School of Mathematical and Physical Sciences, University of Newcastle, Callaghan, NSW 2308, Australia
2
National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, NSW 2522, Australia
*
Author to whom correspondence should be addressed.
Stats 2018, 1(1), 92-97; https://doi.org/10.3390/stats1010007
Submission received: 14 June 2018 / Revised: 13 July 2018 / Accepted: 13 July 2018 / Published: 22 July 2018

Abstract

:
We consider the little-known one parameter Lindley distribution. This distribution may be of interest as it appears to be more flexible than the exponential distribution, the Lindley fitting more data than the exponential. We give smooth tests of fit for this distribution. The smooth test for the Lindley has power comparable with the Anderson-Darling test. Advantages of the smooth test are discussed. Examples that illustrate the flexibility of this distributions is given.

1. Introduction

The Lindley distribution was introduced by Lindley [1] to analyze failure time data. The Lindley distribution belongs to the exponential family and it can be written as a mixture of the exponential and gamma distributions. Its motivation arises from its ability to model failure data with increasing, decreasing, unimodal, and bathtub-shaped hazard rates. See [2]. Ghitany et al. [3] give many properties of the Lindley distribution. They suggest it is often a better model than the traditional exponential distribution that is commonly used to model lifetime or waiting time data. For example, the exponential distribution is limited to lifetime data with coefficients of variation 1, skewness 2, and kurtosis 6. The Lindley distribution allows a greater range for these coefficients: namely 1/ 2 to 1, 2 to 2 and 6 to 9 respectively. Similarly, the hazard rate function of the exponential is more limited than that of the Lindley.
Ghitany et al. [3] examine the fit of the Lindley distribution to some waiting time data by looking at plots and by showing the Lindley likelihood is better than the exponential likelihood. However, this does not demonstrate that the Lindley distribution fits the data well, only that it fits better than the exponential. Assessment of the plots is subjective and here we derive a smooth test of fit to give a more objective assessment of goodness of fit of the Lindley model. We also consider the widely used Anderson-Darling test.
The Lindley distribution has probability density function
f ( x ;   θ ) = θ 2 θ + 1   ( 1 + x )   e θ x   for   x   > and   zero   otherwise ,   in   which   θ > 0
and cumulative distribution function
F ( x ;   θ ) = 1 θ + 1 + θ x θ + 1 e θ x   for   x   > and   zero   otherwise .
Smooth tests of fit can be found using the second and third order smooth test components. Section 2 discusses smooth testing for goodness of fit. Section 3 looks at the approach to the asymptotic chi-squared distributions of the smooth test statistics and finds them to be quite slow. It is suggested that p-values be found using the parametric bootstrap. A slightly expanded version of an algorithm in Ghitany et al. [3] generating random Lindley variates is given in Section 3 so that these p-values can be calculated. Some powers of the smooth test and the Anderson-Darling test are compared in Section 4 while Section 5 gives two examples.

2. The Smooth Test Statistics

Tests of fit are an important element in determining the suitability of statistical models. The genesis of the smooth tests is Neyman [4], who nested the null probability density function in a family of distributions indexed by the elements of a vector parameter and tested if that parameter was consistent with zero. Neyman only considered testing for distributions with no nuisance parameters and hence, after transforming by the probability integral transformation, testing for the simple uniform (0, 1) distribution. Best and Rayner [5] give an early discussion of smooth tests for distributions with nuisance parameters by considering a smooth test for normality. The subject has undergone considerable development, especially in recent years. See, for example, [6,7].
Of particular interest here is [6] (Section 9.2) in which the following is shown. Suppose we have a random sample X1, …, Xn from a distribution hypothesized to have probability density function f(x; β) in which β = (β1, …, βq)T is a q × 1 vector of nuisance parameters. Moreover, assume that the densities f(x; β) form a multi-parameter exponential family. An alternative probability density function is taken to be
g k ( x ;   θ ,   β ) = C ( θ , β ) exp { i = 1 k θ i h i ( x ; β ) } f ( x ; β )
in which θ = (θ1, …, θk)T, it is assumed a normalizing constant C(θ, β) exists and {hi(x; θ, β)} is a set of orthonormal functions with weight function f(x; β).
When testing H: θ = 0 against K: θ ≠ 0 the smooth test statistic is a sum of squares of the components Vq+1, …, Vk, that is, V q + 1 2 + + V k 2 , in which Vr = j = 1 n h r ( X j ; β ^ ) / n , β ^ being the maximum likelihood estimator of β when θ = 0. The model gk(x; θ, β) is over-parametrized, with the θ1, …, θq playing the same role as β1, …, βq. One way of dealing with this technically is to drop θ1, …, θq from gk(x; θ, β). The over-parametrization is apparent when it is realized that the likelihood equations are equivalent to j = 1 n h r ( X j ; β ) / n = 0 for r = 1, …, q and so V1 ≡ … ≡ Vq ≡ 0. The non-degenerate components are asymptotically independent and asymptotically χ 1 2 distributed.
Since the components Vr involve the orthonormal polynomials, we now give the orthonormal polynomials of a random variable X up to order three. These results are true for any distribution for which the moments exist. The notation suppresses the dependence on the nuisance parameter. We take h0(x) = 1 for all x. Write μ for the mean of X and μr, r = 2, 3, … for the central moments of X. Then
h 1 ( x ) = ( x μ ) / μ 2 ,
h 2 ( x ) = { ( x μ ) 2 μ 3 ( x μ ) / μ 2 μ 2 } / d in   which   d = μ 4 μ 3 2 / μ 2 μ 2 2 ,   and
h 3 ( x ) = { ( x μ ) 3 a ( x μ ) 2 b ( x μ ) c } / e , in   which   a = ( μ 5 μ 3 μ 4 / μ 2 μ 2 μ 3 ) / d , b = ( μ 4 2 / μ 2 μ 2 μ 4 μ 3 μ 5 / μ 2 + μ 3 2 ) / d , c = ( 2 μ 3 μ 4 μ 3 2 / μ 2 μ 2 μ 5 ) / d ,   and e = μ 6 2 a μ 5 + ( a 2 2 b ) μ 4 + 2 ( a b c ) μ 3 + ( b 2 + 2 a c ) μ 2 + c 2 .
Should further orthonormal polynomials be required the recurrence relation in [8] can be used. See [9] (Appendix D) for details of the numerical construction of orthogonal polynomials in R.
The moments about the origin of the Lindley distribution are
E[Xr] = r!(θ + r + 1)/{θ r(θ + 1)}
from which the central moments can be found. In particular
μ = (θ + 2)/{θ(θ + 1)}, μ2 = (θ2 + 4θ + 2)/{θ2(θ + 1)2},
μ3 = 2(θ3 + 6θ2 + 6θ + 2)/{θ3(θ + 1)3}, and
μ4 = 3(3θ4 + 24θ3 + 44θ2 + 32θ + 8)/{θ 4(θ + 1)4}.
For the exponential distribution the moments are relatively simple and lead to
h 1 ( x ) = 1 x / x ¯ ,   h 2 ( x ) = 1 2 x / x ¯ + ( x / x ¯ ) 2   and h 3 ( x ) = 1 3 x / x ¯ + 3 ( x / x ¯ ) 2 / 2 + ( x / x ¯ ) 3 / 6 .
The early literature suggested that the components Vr were diagnostic. That is, if the test statistic V q + 1 2 + + V k 2 was found to be significant at some prescribed level then the Vr could be used to diagnose the model failure. For example, a significant V3 would indicate the third moment of the hypothesized distribution was not consistent with the data. In fact, a significant Vr could be attributed to moments up to the 2rth. However, in most applications it is reasonable to say that a significant Vr suggests model failure in moments up to the rth. This could be confirmed (or not) by plotting the data.
For distributions from a one parameter exponential family the maximum likelihood and method of moments estimators coincide. It follows that θ ^ can be readily found by solving X ¯ = μ ^ = ( θ ^ + 2 ) / { θ ^ ( θ ^ + 1 ) } , which gives
θ ^ = 1 X ¯ + ( X ¯ 1 ) 2 + 8 X ¯ 2 X ¯   provided   X ¯ > 0 .
That V1 is degenerate is clear from V1 = i = 1 n ( X i μ ^ ) / n μ 2 = ( X ¯ μ ^ ) n / μ ^ 2 0 .
The choice of the order k of the smooth test is typically a trade-off between the test power and the alternatives detected. A larger k means the test is more omnibus, seeking to detect alternatives to the null in a richer family of distributions. A smaller k gives a more focused test with greater power for the alternatives for which the test is sensitive, but no power for other alternatives. Based solely on intuition Neyman [4] took k to be four. Rayner et al. [6] discuss modeling approaches for the choice of k. A larger k requires more orthonormal polynomials and hence more moments to calculate them. Here we make the pragmatic option of considering only two non-degenerate components through V 2 2 , V 3 2 and S = V 2 2 + V 3 2 . In the next section we briefly consider the approach of these test statistics to their asymptotic chi-squared distributions. Use of S is similar to the test in [10] which is also based on the first two non-zero smooth test components.

3. The Approach to the Asymptotic Distribution

In Table 1 we look, for θ = 0.5 and 1.5, at the approach to the asymptotic χ 1 2 distribution of V 2 2 and V 3 2 and the approach to the asymptotic χ 2 2 distribution of S = V 2 2 + V 3 2 . The results in Table 1 are 5% critical values found using 100,000 simulations of Monte Carlo samples of size n. A random variate generator, given below, is needed for these results, the powers of the next section and bootstrap p-values.
The convergence to the asymptotic values is quite slow and so we suggest in applications that for smaller sample sizes the parametric bootstrap will be needed to find p-values. The Table 1 results are similar for θ = 0.5 and θ = 1.5.
To generate random Lindley values, we follow [3] and observe that the Lindley distribution is a mixture of an exponential (θ) distribution and a gamma (2, θ) distribution. To obtain a random x value we need four uniform (0, 1) values, u1, u2, u3 and u4 say, and take x = −{log(u1 u2)}/θ unless u4θ/(θ + 1), in which case x = −(log u3)/θ.
To find parametric bootstrap p-values generate Lindley samples of size n many (10,000 subsequently) times and take the p-values as the proportion of the samples with test statistics greater than or equal to the values of the test statistics for the original dataset. To find parametric bootstrap powers for each alternative distribution generate many (10,000 subsequently) samples of size n. Then the power is the proportion of these samples with p-value less than the significance level.

4. Power Comparisons

For a significance level of α = 0.05 and a sample size of n = 20, Table 2, Table 3 and Table 4 give some powers for tests based on V 2 2 , V 3 2 , S and AD where AD is the Anderson-Darling test statistic
A D = n 1 n i = 1 n ( 2 i 1 ) { log z ( i ) + log ( 1 z ( n + 1 i ) ) }
in which {z(i)} are ordered values of {zi}, where zi = F(xi; θ). The Anderson-Darling test is included as it is well-known and usually performs well in power studies for other distributions.
The tests based on S and AD have similar powers. Notice that we used critical values such that all four of the tests we examined had equal test size. This is necessary so that power advantages are not due to poor approximation to the finite sample null distribution by, say, use of the asymptotic critical values. Random values from the alternative distributions were generated using the IMSL software package. Alternatives were chosen to give powers not all near 0.05 or 1.00, to demonstrate differences in the effectiveness of the tests.
The test based on V 2 2 has slightly less power than the tests based on S and AD. It has little power if the alternative has similar variance to the Lindley variance, which is quite reasonable as it is particularly sensitive to distributions with the Lindley variance. The test based on V 3 2 is, roughly, testing for distributions with the Lindley skewness when V 2 2 is not significant. This is why it is useful to apply V 2 2 and V 3 2 together, either separately as in exploratory data analysis, or more formally together, via S.

5. Examples

In both the examples following the bootstrap p-values are based on 1000 samples.
Waiting Time Data. Ghitany [3] gives the waiting times (in minutes) before service of 100 bank customers. On the basis of a superior log likelihood they conclude that the Lindley distribution gives a better fit than the exponential.
In testing for the Lindley distribution, we find bootstrap p-values for the tests based on V 2 2 , V 3 2 , S and AD to be 0.61, 0.49, 0.70 and 0.50 respectively. Using the asymptotic chi-squared distributions the p-values for V 2 2 , V 3 2 and S are 0.62, 0.60 and 0.77 respectively. We can conclude that for a sample size this large there is acceptable agreement between the p-values based on the bootstrap and the asymptotic chi-squared distributions. Moreover, the Lindley distribution fits the data well.
Operational Lifetime Data. Angus [11] gave 20 operational lifetimes in hours, namely:
  • 6278, 3113, 5236, 11584, 12628, 7725, 8604, 14266, 6125, 9350,
  • 3212, 9003, 3523, 12888, 9460, 13431, 17809, 2812, 11825, 2398.
These data are analysed in [6] (Example 6.3.1 and Example 11.7.1) where it is found that a test for exponentiality is significant at the 0.05 level while several tests for the generalised Pareto distribution are not significant at the 0.05 level.
In testing for the Lindley distribution bootstrap p-values for the tests based on V 2 2 , V 3 2 , S and AD are 0.14, 0.27, 0.17 and 0.30 respectively. Using the asymptotic chi-squared distributions the p-values for V 2 2 , V 3 2 and S are 0.21, 0.44 and 0.34 respectively. For this sample size there is not good agreement between the p-values based on the bootstrap and the asymptotic chi-squared distributions. Moreover, it appears that, unlike the exponential distribution, the Lindley fits the data well.

6. Conclusions

Tests of fit are an important element in determining the suitability of statistical models. They give objective model assessments, which may be complemented by subjective graphical methods such as Q-Q plots. Comparison of statistics such as likelihoods can show that one model is a better fit than another, but not whether either model is a good fit or not. We have given a smooth test of fit statistic S for the Lindley distribution. Two datasets illustrate the applicability of the Lindley distribution, which provides a good model for both datasets, while each of the exponential models fails. For small sample sizes, we suggest that p-values be given using the parametric bootstrap. For larger sample sizes, the asymptotic χ2 distribution may be useful.

Author Contributions

Both authors jointly conceived and designed the project. Best wrote Section 3, Section 4 and Section 5; J.C.W.R. wrote the remainder.

Acknowledgments

We thank the reviewers whose insightful comments helped improve this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lindley, D.V. Fiducial distributions and Bayes theorem. J. Royal Stat. Soc. B 1958, 20, 102–107. [Google Scholar]
  2. Cakmakyapan, S.; Özel Kadilar, G. The Lindley family of distributions: properties and applications. Hacet. J. Math. Stat. 2016, 46. [Google Scholar] [CrossRef]
  3. Ghitany, M.E.; Atieh, B.; Nadarajah, S. Lindley distribution and its applications. Math. Comput. Simul. 2008, 78, 493–506. [Google Scholar] [CrossRef]
  4. Neyman, J. Smooth test for goodness of fit. Skand. Aktuarietidskr. 1937, 20, 149–199. [Google Scholar] [CrossRef]
  5. Best, D.J.; Rayner, J.C.W. Lancaster’s test of normality. J. Stat. Plan. Infer. 1985, 12, 395–400. [Google Scholar] [CrossRef]
  6. Rayner, J.C.W.; Thas, O.; Best, D.J. Smooth Tests of Goodness of Fit: Using R, 2nd ed.; Wiley: Singapore, 2009. [Google Scholar]
  7. Rayner, J.C.W.; Thas, O.; Best, D.J. Smooth tests of goodness of fit. Wiley Interdiscip. Rev. Comput. Stat. 2011, 3, 397–406. [Google Scholar] [CrossRef]
  8. Rayner, J.C.W.; Thas, O.; De Boeck, B. A generalised Emerson recurrence relation. Aust. NZ J. Stat. 2008, 50, 235–240. [Google Scholar] [CrossRef]
  9. Rippon, P. Application of Smooth Tests of Goodness of Fit to Generalized Linear Models. Ph.D. Thesis, University of Newcastle, Callaghan, Australia, 2013. [Google Scholar]
  10. Jacque, C.M.; Bera, A.K. A test for normality of observations and regression residuals. Int. Stat. Rev. 1987, 55, 163–177. [Google Scholar]
  11. Angus, J.E. Goodness-of-fit tests for exponentiality based on a loss-of-memory type functional equation. J. Stat. Plan. Infer. 1982, 6, 241–251. [Google Scholar] [CrossRef]
Table 1. 5% critical values based on 100,000 simulations of samples of size n for V 2 2 , V 3 2 and S when θ = 0.5 and 1.5.
Table 1. 5% critical values based on 100,000 simulations of samples of size n for V 2 2 , V 3 2 and S when θ = 0.5 and 1.5.
nθ = 0.5θ = 1.5
V 2 2 V 3 2 S V 2 2 V 3 2 S
202.591.934.382.651.934.33
1003.472.715.063.412.524.98
2003.692.955.423.662.905.73
10003.833.595.723.833.285.88
10,0003.893.906.023.873.896.06
3.843.845.993.843.845.99
Table 2. Powers of tests based on V 2 2 , V 3 2 , S and AD for α = 0.05 and n = 20.
Table 2. Powers of tests based on V 2 2 , V 3 2 , S and AD for α = 0.05 and n = 20.
Alternative V 2 2 V 3 2 V 3 2 AD
Lindley (0.5)0.050.050.050.05
χ 0.75 2 0.830.760.880.92
χ 1 2 0.630.620.720.8
χ 2 2 0.180.160.180.15
χ 3 2 0.060.040.060.05
χ 4 2 0.080.100.090.09
χ 8 2 0.530.550.590.60
Weibull (0.8)0.430.340.440.43
Weibull (1.5)0.210.260.270.28
Weibull (2.0)0.750.840.840.84
Beta (1, 2)0.140.180.180.15
Beta (1, 3)0.070.110.110.11
Beta (2, 3)0.870.930.940.90
Uniform (0, 1)0.510.570.570.54
Table 3. Powers of tests based on V 2 2 , V 3 2 , S and AD for α = 0.05 and n = 50.
Table 3. Powers of tests based on V 2 2 , V 3 2 , S and AD for α = 0.05 and n = 50.
Alternative V 2 2 V 3 2 SAD
Lindley (0.5)0.050.050.050.05
χ 0.75 2 0.940.750.970.99
χ 1.5 2 0.630.500.700.73
0.300.200.310.27
χ 3 2 0.060.060.060.08
χ 4 2 0.150.160.180.23
χ 8 2 0.930.940.960.99
Weibull (0.8)0.740.550.780.81
Weibull (1.5)0.560.550.600.65
Weibull (1.7)0.900.890.930.96
Beta (1, 2)0.520.420.510.50
Beta (1, 3)0.240.190.240.24
Table 4. Powers of tests based on V 2 2 , V 3 2 , S and AD for α = 0.05 and n = 100.
Table 4. Powers of tests based on V 2 2 , V 3 2 , S and AD for α = 0.05 and n = 100.
Alternative V 2 2 V 3 2 SAD
Abs N (0, 1)0.550.200.470.49
χ 2 2 0.410.260.420.43
χ 3 2 0.670.630.670.76
χ 4 2 0.340.270.350.48
Weibull (1.5)0.90.840.910.96
Beta (1, 2)0.960.640.890.85
Beta (1, 3)0.650.330.570.50

Share and Cite

MDPI and ACS Style

Best, D.J.; Rayner, J.C.W. Smooth Tests of Fit for the Lindley Distribution. Stats 2018, 1, 92-97. https://doi.org/10.3390/stats1010007

AMA Style

Best DJ, Rayner JCW. Smooth Tests of Fit for the Lindley Distribution. Stats. 2018; 1(1):92-97. https://doi.org/10.3390/stats1010007

Chicago/Turabian Style

Best, D. J., and J. C. W. Rayner. 2018. "Smooth Tests of Fit for the Lindley Distribution" Stats 1, no. 1: 92-97. https://doi.org/10.3390/stats1010007

Article Metrics

Back to TopTop