Abstract
The exponential dispersion model (EDM) generated by the Landau distribution, denoted by EDM-EVF (exponential variance function), belongs to the Tweedie scale with power infinity. Its density function does not have an explicit form and, as of yet, has not been used for statistical aspects. Out of all EDMs belonging to the Tweedie scale, only two EDMs are steep and supported on the whole real line: the normal EDM with constant variance function and the EDM-EVF. All other absolutely continuous steep EDMs in the Tweedie scale are supported on the positive real line. This paper aims to accomplish an overall picture of all generalized linear model (GLM) applications belonging to the Tweedie scale by including the EDM-EVF. This paper introduces all GLM ingredients needed for its analysis, including the respective link function and total and scaled deviance. We study its analysis of deviance, derive the asymptotic properties of the maximum likelihood estimation (MLE) of the covariate parameters, and obtain the asymptotic distribution of deviance, using saddlepoint approximation. We provide numerical studies, which include estimation algorithm, simulation studies, and applications to three real datasets, and demonstrate that GLM using the EDM-EVF performs better than the linear model based on the normal EDM. An R package accompanies all of these.
Keywords:
exponential dispersion model; generalized linear model; exponential variance function; small-dispersion asymptotics; saddlepoint approximation; analysis of the deviance MSC:
62J12; 62F03; 62F12
1. Introduction
The (reproductive) , class (known as the Tweedie class, cf., [1]) is composed of all exponential dispersion models (EDMs) with variance functions (VFs) of the form
where m is the mean, is the mean parameter space, is the dispersion parameter, and is the power parameter (cf., [2,3] and the references cited therein).
Let be an EDM belonging to the TBE class. Also, let and denote, respectively, the convex support and mean parameter space of . Among the class, the subclasses containing all absolutely continuous (with respect to the Lebesgue measure) models comprise the following cases (cf., [2]):
- When , is generated by a stable distribution with a stable index in supported on with i.e., , a proper subset of (interior of ), for all
- When , is the normal EDM with .
- When , is the gamma EDM with
- When , is generated by a positive stable distribution with a stable index in supported on with for all
- When , is the EDM generated by the Landau distribution, supported on with . It is absolutely continuous with respect to the Lebesgue measure on and is the limit of EDMs having power VFs (see [2,3] for further details).
Two important aspects related to the above TBE models should be remarked at this point:
- Complexity of the density function. Except for the normal (), gamma () and inverse Gaussian () EDMs, no other possesses an explicit density, in terms of algebraic functions. All such densities can only be expressed in terms of integral form or power series; hence, their evaluation becomes rather complicated. To resolve the problem, several studies have directly employed the saddlepoint approximation for density estimation on the scale for , as discussed in [4,5,6,7,8]. The saddlepoint approximation can do so by substituting the part of the density that lacks a closed-form representation with a simple analytic expression. Additionally, the saddlepoint approximation can be utilized instead of traditional likelihood methods to derive the maximum likelihood estimation (MLE) of (cf., [6,9]). Dunn created and maintained the Tweedie R package [10], while [11] contributed to and maintained the statmod R package. In this frame, the function tweedie.profile in the tweedie R package practically enables the fit of TBE models. These packages can be extended to include the as well.
- Steepness. The model is called steep if . Steepness is an essential property in two aspects: (1) First, it is related to the existence of the MLE of m. Indeed, if is steep and are n i.i.d. random variables drawn from then the MLE of denoted by (sample average), exists with probability one, and is given by the gradient of the log-likelihood (cf., [12], Theorem 9.29). (2) Second, steepness is a necessary condition for applying generalized linear models (GLMs) methodology to EDMs (cf., [2,6,13]). Consequently, out of the absolutely continuous models described above, only those with are steep, as their mean parameter space equals the interior of their convex support (i.e., for and for ). For any is not steep, as its mean parameter space is a proper subset of its interior convex support .
GLM applications for are straightforward and have been analyzed in various references (cf., [13] and the references cited therein). GLM applications for (), are discussed and presented by [6], who also maintained an R package (see [10]) for these EDMs. Consequently, we are left with absolutely continuous models supported on the whole real line ( As already noted, the models for are not steep—a fact which precludes them from being candidates for GLM analysis. This is quite unfortunate, as such a subclass comprises an infinite set of absolutely continuous EDMs (with respect to the Lebesgue measure) that are supported on the whole real line. Thus, the only remaining steep EDMs supported on the whole real line are the normal () and the EDM generated by the Landau distribution (), where both are suitable for GLM applications. The normal EDM constitutes the classical linear regression model, whereas the requires further analysis by GLM methodology, an analysis that establishes the core of this paper. Such an analysis would complement the results of [6] and accomplish a complete analysis of all absolutely continuous TBE models.
The paper is organized as follows. Section 2 presents some preliminaries on natural exponential families (NEFs), additive, and reproductive EDMs. Section 3 introduces the —the EDM generated by the Landau distribution and the GLM ingredients needed for its analysis. Mainly, we present its link function and total and scaled deviance. In Section 4, we study its analysis of deviance, derive the asymptotic properties of the MLEs of the covariate parameters , and obtain the asymptotic distribution of deviance, using the saddlepoint approximation. Section 5 includes the estimation algorithm, a brief description of our R package, and simulation studies. In Section 6, we provide the analysis of real data, including applications to three real datasets. It is demonstrated there that the GLM using the performs better than the linear model based on the normal distribution. Some concluding remarks are presented in Section 7. Proofs of statements (propositions, corollaries, and theorems) in this paper are relegated to Appendix A.
2. Preliminaries: NEFs, Mean Value Representation, and Additive and Reproductive EDMs
NEFs. The preliminaries in the sequel hold for any positive Radon measure on . Without loss of generality, we confine our introduction to where is an absolutely continuous positive Radon measure with respect to the Lebesgue measure on the real line. The Laplace transform of and its effective domain are defined, respectively, by
Let , and assume is non-empty. Then, the NEF generated by h is defined by the densities of the form
where is the cumulant transform of L. The cumulant transform is a real analytic on , implying that the r-th cumulant of is given by In particular, the mean, mean parameter space, and variance corresponding to (1) are given, respectively, by , , and . As is strictly increasing, its inverse mapping is well-defined. So, we denote by the variance function (VF) corresponding to (1). The pair uniquely defines the NEF generated by h within the class of NEFs (cf., [14]). Also, V is called the unit VF.
Mean value parameterization. For GLM applications and various other statistical aspects, it is necessary to express the NEF with densities (1) in terms of its mean rather than in terms of the artificial parameter (for details, see [3]). Indeed, given a VF then and are the primitives of and and, thus, are given by
implying that the mean value representation of (1) is given by
Additive EDMs. The Jorgensen set related to (1) is defined by (cf., [2])
The set is not empty, due to convolution. Moreover, if h is infinitely divisible, a valid property for all TBE members. Accordingly, the additive EDM (cf., [2]) is defined by densities of the form
where the VF corresponding to the additive EDM is given by (
Reproductive EDMs. In general, for various statistical aspects, particularly for GLM applications, it is more effective to represent (3) by resembling the normal structure. Such a representation, called the reproductive EDM, is obtained by mapping , where . Then, the densities of this mapping have the form (cf., [2,6,15])
where , and is the support of h. It is crucial to note that the structure in (4) is not suitable for the discrete case (counting measures on ). This is because, for different ’s, it alters the support of h. In contrast, for the absolutely continuous case, the structure in (4) is appropriate. The VF of the reproductive EDM (4) is given by
where if , or then , or respectively.
3. GLM Applications for the EDM Generated by the Landau Distribution—Some Basics
In this section, we provide some required components for GLM applications. We first give an expression for the density. We then present the related link function and scaled deviance.
3.1. Density Function
The distribution is the EDM generated by the Landau distribution, known as a Tweedie model with power infinity, and it possesses a simple unit VF of the form It is steep (), infinitely divisible, skewed to the right, leptokurtic (i.e., has fatter tails), and absolutely continuous, supported on the whole real line. It was surveyed in detail and further developed by [3], and was named there as EDM-EVF (exponential VF). Its reproductive EDM density, of the form (4), is
where
and VF
The expressions for and , needed for its mean value parameterization, are
Thus, the density (6) can be written as
If then we write this as or we use the standard EDM notation and write . The mean, variance, and cumulants of such a Y are
3.2. Scaled Deviance and Link Function
We shall now consider two essential ingredients needed for GLM applications of EDM-EVFs (9): namely, the scaled deviance and the link function. These were introduced by [2,15] (see also [6,16]). For GLM applications, we need the following ingredients. Consider
It is evident that . By taking the partial derivative of and setting it to zero, we obtain
implying that maximizes (since is steep). Hence, the unit deviance
can be considered as a distance measure with two properties: and for GLMs assume a systematic component with the linear predictor
This is linked to the mean m through a link function g, such that . For , we choose the canonical (and simple) link function
Let be a set of independent observations, where (assuming a single dispersion parameter) is associated with the link function Write for the set of covariates, in which case and the total and scaled deviances are given, respectively, by
and
Consequently, the log-likelihood is
Let be the MLE of . As in linear models, we aim to estimate and obtain its asymptotic behavior.
4. Asymptotic Properties
This section deals with the saddlepoint approximation and asymptotic behavior of the MLEs of the parameters involved. The section establishes the central core of the asymptotic behavior of all the statistics required for the appropriate analysis of the deviance.
4.1. Asymptotic Properties of MLE
Let us start with the saddlepoint approximation (14) below, which is essential in the asymptotic theory of GLMs. The exact distribution (9) is challenging to handle, due to the cumbersome form of (7). The saddlepoint approximation neatly gets rid of it. For more details on this point, see Sections 1.5.3 and 3.5.1 in [2] and Section 5.4.3 in [6]. The following proposition presents the saddlepoint approximation for
Proposition 1.
Let . Then, for sufficiently small the saddlepoint approximation for the density of Y is given by
where and .
Proof.
See Appendix A. □
The following corollary, an immediate consequence of Proposition 1, implies convergence to normality:
Corollary 1.
Let ; then,
where and denotes convergence in distribution.
Proof.
See Appendix A. □
Corollary 1 provides the asymptotic normality for a single observation y. For with , we have
where
Using (16), the following theorem shows that the MLE of is asymptotically normally distributed.
Theorem 1.
Let be the MLE of β and let be the design matrix. If has bounded eigenvalues, then
where is the true parameter.
Proof.
See Appendix A. □
4.2. Analysis of the Deviance
With and known, we consider the distribution of deviance. We claim that when the saddlepoint approximation holds (and it does for ) then the scaled deviance distribution follows an approximate chi-square distribution.
Theorem 2.
Proof.
See Appendix A. □
When is unknown, it is replaced by its MLE . Thus, we define the residual and scaled residual deviances as
and
As the GLM considered in Section 3 is involved with regression parameters, it follows that
Generally, the deviance is most useful not as an absolute measure of goodness-of-fit, but rather for comparing two nested models. For example, one may want to test whether incorporating an additional covariate significantly improves the model fit. In this case, the deviance can be employed to compare two nested GLMs that are based on the same EDM but have different fitted systematic components:
and
where denotes the MLE of under Model A, denotes the MLE of under Model B, and is a covariate, . Note that Model A is a special case of Model B, with . Accordingly, we consider the following hypotheses, to determine if the simpler Model A is adequate to model the data:
We have previously observed that the total deviance captures that part of the log-likelihood that depends on . Therefore, the following theorem holds, from which it can be seen that (18) is a special case of Theorem 3:
Theorem 3.
If φ is known, the likelihood ratio test (LRT) statistic for comparing Models A and B is
Then, under the null hypothesis in (19), as .
Proof.
See Appendix A. □
Consider the two models in Theorem 3 with both and unknown. Then, an estimate of is required. This is done in Theorem 4, which is deduced from Theorem 3:
Theorem 4.
If φ is unknown, the appropriate statistic for comparing Model A with Model B is
where is an estimate of φ based on Model B. Then, under the null hypothesis in (19),
Proof.
It suffices to prove the asymptotic independence of and . The proof is similar to Theorem 4.3 in reference [17]. □
Note that our above statements about asymptotic distributions are all based on the assumption that . These results are called small-dispersion asymptotics, regardless of the sample size n. Large-sample asymptotics are also well-known and, hence, no further explanations are provided.
5. Simulation Studies
5.1. Implementation
Herein, we discuss the estimation of the unknown parameters in the GLM: the covariate coefficients and the dispersion parameter . For the estimation of , we use iteratively reweighted least squares (IRLS). The score vector for is
where and are called the working weights, and M is the diagonal matrix of the link derivatives . The Fisher information matrix for is
Thus, an iterative technique using the Newton–Raphson method yields
where the Fisher information matrix is used in place of the observed information matrix, and the superscript denotes the r-th iterate. The iteration can be re-organized as IRLS (cf., [6]):
where , the working response vector, is given by
and all other quantities on the right-hand side are evaluated at .
For the estimation of , we use the mean deviance estimator in [6]. It is easy to show that the MLE of is the simple mean deviance with the saddlepoint approximation density. When taking into account the estimation of and the residual degrees of freedom, we obtain the mean deviance estimator of as
We summarize all of the above as Algorithm 1.
| Algorithm 1 Estimating and Based on Iteratively Reweighted Least Squares Estimation (IRLSE) and Mean Deviance |
|
We have developed an R package named TBEinf [18], which is used in our numerical experiments and is publicly available at https://github.com/xliusufe/TBEinf (accessed on 28 April 2024).
The package includes a program for computing the density of by a direct calculation (cf., [3]), saddlepoint approximation (cf., [6]), Fourier inversion (cf., [2,16]), and modified W-Transformation (cf., [16,19]). Specifically, the function dTBEinf in the package calculates density by real density when method = “real”, saddlepoint approximation when method = “saddle”, Fourier inversion when method = “finverse”, and modified W-transformation when method = “mWtrans”.
Also, the package applies GLMs methodology to for estimation and prediction. The estimates of the covariate coefficients and the dispersion parameter are obtained through Algorithm 1.
5.2. Simulation Studies
Firstly, we generated simulated data using (16). We let the sample sizes be 800, respectively, and the true values be , . The first column of was a 1-vector, and all the other elements were random numbers sampled from . We set 1000 repetitions and then generated 1000 for each n. We estimated and according to Algorithm 1.
By applying Algorithm 1, the average value of estimated was for , for , for , and for . It can be seen that they were relatively small compared to the true value . This is because the total deviance was very small.
Table 1 lists the simulation results calculated by applying Algorithm 1 with varying sample sizes . Herein, sd denotes the standard deviation (SD), which is computed by for , where is the estimate of at the j-th repetition and is the average of . Also, se denotes the standard error (SE) of , which can be calculated using (17) by , where is the average value of estimated , and is the k-th diagonal element of . From Table 1, we can see that the average bias was around and SD was around , which demonstrated the estimation procedure performs well and stably. It is observed that the SDs were all close to the SEs, and they all decreased as n increased, which implies that the simulation results verify that the asymptotic properties are reasonable.
Table 1.
Average of bias, sd, and se of estimated for . is the true value of . is the intercept term.
6. Real Data Analysis
We present the proposed estimation procedure through applications to three real datasets. The first and second datasets, grazing and hcrabs, are both from the R package ‘GLMsData’ (see [6,20]). The last one is Boston housing data.
6.1. Dataset “Grazing”
This dataset reveals the density of understorey birds across a series of sites located on either side of a stockproof fence, in two distinct areas. It has the potential to provide insights into the impact of habitat fragmentation on bird populations (cf., [20]):
- Sample size: ;
- The number of variables: ;
- Variables description; see Table 2.
Table 2. Variables description of grazing dataset.
To verify the appropriateness of the GLM for the data, we evaluated the prediction performance of GLM and compared it with a linear model. We conducted 500 random splits of the 62 observations. In each split, we randomly selected 80% as the training set and the rest as the testing set , where 13 was the result of multiplying 62 by 20% and rounding up. We applied both GLM and the linear model to the training set and estimated the coefficients.
By applying Algorithm 1 for the GLM, the estimates of and were
and . Then, we predicted by , where . We let and calculated the mean squared error (MSE) as MSE, where . In the linear model, we estimated as , using least squares (without ). Then, we predicted by . We let and calculated MSE, where .
Thus, we could compute the average and sd of the predictions’ MSEs of both models under 500 random splits. For GLM, we obtained the average and sd of the MSEs to be 0.111 and 0.017, respectively. For the linear model, we obtained the average and sd of the MSEs to be 0.760 and 0.238, respectively. Thus, it can be seen that the results for the GLM performed much better than the linear model for both average and sd. Additionally, we calculated the Bayesian information criterion (BIC) for both models, resulting in and , where LM denotes the linear model. It is evident that the BIC for GLM was significantly lower than that for the linear model, indicating a better model fit.
6.2. Dataset “Hcrabs”
This dataset describes the number of male crabs attached to female horseshoe crabs (cf., [20]):
- Sample size: ;
- The number of variables: ;
- Variables description; see Table 3 below.
Table 3. Variables description of hcrabs dataset.
As we did with processing the first dataset above, we meticulously conducted 500 random splits of the 173 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set , where 35 was the result of multiplying 173 by 20% and rounding up. We applied both GLM and the linear model, ensuring the validity of our approach.
The estimates of and for the GLM were
and . Then, we predicted by , where . We let and calculated MSE, where . In the linear model, we estimated as using least squares. Then, we predicted by . We let and calculated MSE, where .
For GLM, we ascertained the average and sd of the MSEs to be 0.011 and 0.004, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.837 and 0.276, respectively. Here, again, it can be seen that the results for the GLM performed much better than the linear model for both average and sd. We calculated the BIC for both models, resulting in and . It is clear that the BIC for GLM was lower than that for the linear model, indicating a superior model fit.
6.3. Dataset “Boston Housing”
This dataset is taken from Harrison Jr and Rubinfeld 1978, including 14 variables that were measured across 506 census tracts in the Boston area. The response variable can be the logarithm of the median value of the houses in those census tracts of the Boston Standard Metropolitan Statistical Area:
- Sample size: ;
- The number of variables: ;
- Variables description; see Table 4.
Table 4. Variables description of Boston housing dataset.
Again, we conducted 500 random splits of the 506 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set , where 102 was the result of multiplying 506 by 20% and rounding up. We applied both GLM and the linear model, comparing the performance.
For the GLM, the estimates of and were
and . Then, we predicted by , where . We let and calculated MSE, where . In the linear model, we estimated as , using least squares. Then, we predicted by . We let and calculated MSE, where .
For GLM, we ascertained the average and sd of the MSEs to be 0.031 and 0.009, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.041 and 0.010, respectively. It can be seen that this dataset was appropriate for the linear model, and that our GLM could also fit well, which, to some extent, reflects the wide application of GLM. It can also be seen that the results for the GLM were slightly better than the linear model for both average and sd. We computed the BIC for both models, yielding and . The lower BIC of GLM compared to the linear model indicated a superior model fit.
7. Conclusions
In this paper, we were interested in GLM methodology applied to the —the EDM generated by the Landau distribution, an EDM supported on the real line. We introduced its density function, deviance, and link function. We considered the saddlepoint approximation approach for and then deduced the convergence of Y to normality. Based on the small dispersion and saddlepoint approximation, we derived that the asymptotic distribution of MLE for was normal. The analysis of deviance was also studied, considering different situations of and . In numerical studies, we first estimated and using Algorithm 1 and then evaluated its estimation performance. We reported averages of bias, standard deviations (SDs), and standard errors (SEs) in a simulation study. We demonstrated that the biases and SDs were relatively small and that the SDs were close to the SEs. As for applications to the three datasets of real data, the results for GLM showed much better performance than the linear models. To some extent, this indicated the widespread applications of . We also composed an R package for GLM applications of .
We trust that the proposed GLM will be well utilized in modeling more real data and various statistical purposes.
Author Contributions
Conceptualization, S.K.B.-L.; methodology, S.K.B.-L., X.L. and Z.X.; software, X.L. and Z.X.; validation, X.L. and Z.X.; formal analysis, X.L. and Z.X.; investigation, S.K.B.-L., X.L., A.R. and Z.X.; resources, S.K.B.-L. and X.L.; data curation, S.K.B.-L., X.L. and Z.X.; writing—original draft preparation, S.K.B.-L., X.L. and Z.X.; writing—review and editing, S.K.B.-L., X.L., A.R. and Z.X.; visualization, X.L. and Z.X.; supervision, S.K.B.-L. and X.L.; project administration, S.K.B.-L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.
Funding
The research of Liu and Xiang was funded by the National Natural Science Foundation of China (12271329, 72331005), the Program for Innovative Research Team of SUFE, the Shanghai Research Center for Data Science and Decision Technology, the Open Research Fund of the Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, and the Open Research Fund of the Key Laboratory of Analytical Mathematics and Applications (Fujian Normal University), Ministry of Education, P. R. China. The research of Bar-Lev and Ridder was funded by STAR (Stochastics—Theoretical and Applied Research), one of the four mathematics clusters within the Dutch Research Council (NWO).
Data Availability Statement
All real datasets used in this manuscript are explicitly displayed in the paper.
Acknowledgments
We thank two reviewers for helpful comments and suggestions.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| EDM | exponential dispersion model |
| EVF | exponential variance function |
| VF | variance function |
| TBE | Tweedie, Bar-Lev, and Enis |
| NEF | natural exponential family |
| LRT | likelihood ratio test |
| IRLS | iteratively reweighted least squares |
| SD | standard deviation |
| SE | standard error |
Appendix A. Proofs
Proof of Proposition 1.
For , by steepness and (6), the characteristic function of Y is
where . The last equation holds, since the integrand is an EDM density function written in terms of rather than . If is absolutely integrable, then by the Fourier inversion theorem the probability density function of Y is
where i is the complex imaginary unit and .
Since and is the inverse mapping of , we have . Let ; then, . Since the integrand is analytic, we may move the path of integration from to . The density then becomes
We introduce the unit deviance,
Let ; then, for every fixed t, . By expanding k around , we obtain
where , since and .
We now consider the term in curly brackets in (A1). By introducing the unit deviance and expanding k around , this term becomes
where high-order terms of are discarded. From the result
we obtain the approximation for small enough,
This completes the proof of Proposition 1 (cf., [2]). □
Proof of Corollary 1.
Before proving this, we prove a lemma: the unit scaled deviance behaves approximately as the normal unit deviance near its minimum (cf., [2]). Let
For , we first show that
By a simple calculation, we have
Thus, (A2) holds. Then, the unit variance function satisfies the relationship
Furthermore, (A3) implies the following second-order Taylor expansion of near its minimum ():
This expansion shows that the unit deviance behaves approximately as does the normal unit deviance near its minimum.
Proof of Theorem 1.
First, we prove the consistency of , i.e., as , where is the true parameter. We shall consider the behavior of the log-likelihood on the sphere with center at the true point and radius h. We will show that for any sufficiently small h, the probability of
tends to 1 at all points on the surface of , i.e., (cf., [17]). Note that this method also handles the proof of the MLE’s consistency in large-sample asymptotics.
We denote
Through differential operations, we obtain
Let
where . Obviously, is negative definite. By (16), we know as (by use of the Chebyshev inequality); then
For sufficiently small h, by expanding around the true point and multiplying by , we have
We now consider . Suppose that has bounded eigenvalues and its maximum eigenvalue is . By (A5) and the Cauchy–Schwarz inequality, we have
Consider . For each , by (16) and the Chebyshev inequality we obtain, by letting ,
that is,
implying that
The last term of the above inequality tends to 0, since , and the other terms are constants. Thus, we have
That is, with probability tending to 1. Returning to , we have, with probability tending to 1,
The above argument about is based on the result of convergence in the distribution of . In fact, the same conclusion can be achieved by the convergence in probability of . It is crucial to note that the argument above demonstrates how convergence in probability can be achieved through convergence in distribution when . This allows us to directly apply convergence in probability and omit any mention of convergence in distribution in terms of below.
We now consider . We have
For the first term, we note that by (A6), . We use an argument analogous to that used for but replace the Chebyshev inequality with the definition of convergence in probability. Thus, the absolute value of the first term is less than a constant multiple of with probability tending to 1. The second term is a negative quadratic form in . Let . Then, by a straightforward calculation, we have
Thus, is negative and we obtain
So, with (A7)–(A9), we have
for sufficiently small h.
Because is continuous and differentiable on , there must be a local maximum point that satisfies
Combining this with (A10), we obtain
So, when , we have
Now, we will show that , where S is a covariance matrix. Denote
By expanding around the true point , we obtain
where higher-order terms are ignored. Replace with (we can do this since ), and then note that the left side of the equation is . Rearranging this equation, we have
Proof of Theorem 2.
First, we show the unit deviance follows an approximate . By Proposition 1, the moment-generating function (MGF) of the unit deviance is approximately
where . Since the integrand is the (saddlepoint) density of the distribution with , we have
which is identical to the MGF of a . So, as we have
For the set of observations where the ’s are independent, with , we have
Then, by independence, we obtain
which completes the proof of Theorem 2 (cf., [6]). □
Proof of Theorem 3.
We consider the four nested hypotheses (cf., [17]):
- (the saturated hypothesis);
- ;
- ;
of dimensions respectively, where .
Since we proved the asymptotic normality of in Theorem 1, just as in Theorems 10.3.1 and 10.3.3 of [21], we can prove that the likelihood ratio test (LRT) statistic follows asymptotically a chi-square distribution by starting from the simple hypothesis and moving on to the composite hypothesis. That is, for LRT , we have , where q is the corresponding degrees of freedom.
References
- Bar-Lev, S.K. Independent tough identical results: The Tweedie class on power variance functions and the class of Bar-Lev and Enis on reproducible natural exponential families. Int. J. Stat. Probab. 2020, 9, 30–35. [Google Scholar] [CrossRef]
- Jørgensen, B. The Theory of Dispersion Models; Chapman and Hall: London, UK, 1997. [Google Scholar]
- Bar-Lev, S.K. The Exponential Dispersion Model Generated by the Landau Distribution—A Comprehensive Review and Further Developments. Mathematics 2023, 11, 4343. [Google Scholar] [CrossRef]
- Dunn, P.K.; Smyth, G.K. Tweedie Family Densities: Methods of Evaluation. In Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July 2001. [Google Scholar]
- Dunn, P.K.; Smyth, G.K. Series Evaluation of Tweedie Exponential Dispersion Model Densities. Stat. Comput. 2005, 15, 267–280. [Google Scholar] [CrossRef]
- Dunn, P.K.; Smyth, G.K. Generalized Linear Models with Examples in R; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Hougaard, P. Nonlinear Regression and Curved Exponential Families. Improvement of the Approximation to the Asymptotic Distribution. Metrika 1995, 42, 191–202. [Google Scholar] [CrossRef]
- Chen, Z.; Pan, E.; Xia, T.; Li, Y. Optimal degradation-based burn-in policy using Tweedie exponential-dispersion process model with measurement errors. Reliab. Syst. Saf. 2020, 195, 106748. [Google Scholar] [CrossRef]
- Ricci, L.; Martínez, R. Adjusted R2-type measures for Tweedie models. Comput. Stat. Data Anal. 2008, 52, 1650–1660. [Google Scholar] [CrossRef]
- Dunn, P.K. Tweedie: Evaluation of Tweedie Exponential Family Models, R Package Version 2.3.5; 2022. Available online: https://cran.r-project.org/web/packages/tweedie/tweedie.pdf (accessed on 12 September 2023).
- Smyth, G.K. Statmod: Statistical Modeling, R Package Version 1.4.30; 2017. Available online: https://CRAN.R-project.org/package=statmod (accessed on 3 April 2024).
- Barndorff-Nielsen, O. Information and Exponential Families in Statistical Theory; Wiley: New York, NY, USA, 1978. [Google Scholar]
- Merz, M.; Wüthrich, M.V. Statistical Foundations of Actuarial Learning and Its Applications; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
- Morris, C.N. Natural exponential families with quadratic variance functions. Ann. Statist. 1982, 10, 65–80. [Google Scholar] [CrossRef]
- McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1989. [Google Scholar]
- Dunn, P.K.; Smyth, G.K. Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Stat. Comput. 2008, 18, 73–86. [Google Scholar] [CrossRef]
- Jørgensen, B. Small dispersion asymptotics. Braz. J. Probab. Stat. 1987, 1, 59–90. [Google Scholar]
- Liu, X.; Xiang, Z.; Bar-Lev, S.K.; Ridder, A. TBEinf, R Package Version 0.0.1; 2024. Available online: https://github.com/xliusufe/TBEinf (accessed on 28 April 2024).
- Sidi, A. A user-friendly extrapolation method for oscillatory infinite integrals. Math. Comput. 1988, 51, 249–266. [Google Scholar] [CrossRef]
- Dunn, P.K.; Smyth, G.K. GLMsData: Generalized Linear Model Data Sets, R Package Version 1.0.0; 2017. Available online: https://CRAN.R-project.org/package=GLMsData (accessed on 12 April 2024).
- Casella, G.; Berger, R.L. Statistical Inference; Thomson Learning Inc.: Duxbury, MA, USA, 2002. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).