Next Article in Journal
Unsteady Heat Transfer of Pulsating Gas Flows in a Gas-Dynamic System When Filling and Emptying a Cylinder (as Applied to Reciprocating Machines)
Previous Article in Journal
Deep Learning Architecture for Detecting SQL Injection Attacks Based on RNN Autoencoder Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Likelihood Based Inference and Bias Reduction in the Modified Skew-t-Normal Distribution

by
Jaime Arrué
1,
Reinaldo B. Arellano-Valle
2,
Enrique Calderín-Ojeda
3,
Osvaldo Venegas
4,* and
Héctor W. Gómez
1
1
Departamento de Estadística y Ciencias de Datos, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
2
Departamento de Estadística, Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile
3
Centre for Actuarial Studies, Department of Economics, The University of Melbourne, Melbourne, VIC 3010, Australia
4
Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad de Católica de Temuco, Temuco 4780000, Chile
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(15), 3287; https://doi.org/10.3390/math11153287
Submission received: 29 May 2023 / Revised: 11 July 2023 / Accepted: 13 July 2023 / Published: 26 July 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this paper, likelihood-based inference and bias correction based on Firth’s approach are developed in the modified skew-t-normal (MStN) distribution. The latter model exhibits a greater flexibility than the modified skew-normal (MSN) distribution since it is able to model heavily skewed data and thick tails. In addition, the tails are controlled by the shape parameter and the degrees of freedom. We provide the density of this new distribution and present some of its more important properties including a general expression for the moments. The Fisher’s information matrix together with the observed matrix associated with the log-likelihood are also given. Furthermore, the non-singularity of the Fisher’s information matrix for the MStN model is demonstrated when the shape parameter is zero. As the MStN model presents an inferential problem in the shape parameter, Firth’s method for bias reduction was applied for the scalar case and for the location and scale case.

1. Introduction

Arellano-Valle et al. [1] introduced the skew-generalized-normal (SGN) distribution. We say that a random variable Z follows a SGN distribution, denoted as Z S G N ( λ 1 , λ 2 ) , if its probability density function (pdf) takes the following form:
f ( z ; λ 1 , λ 2 ) = 2 ϕ ( z ) Φ λ 1 z 1 + λ 2 z 2 , z R ,
where λ 1 R , λ 2 0 , ϕ , and Φ denote the pdf and the cumulative distribution function (cdf) of the N ( 0 , 1 ) distribution, respectively. For the case λ 1 = 0 , the SGN distribution is reduced to standard normal distribution. On the other hand, non-zero values of parameter λ 1 , directly influence the skewness of the model. In particular, when λ 2 = 0 the SGN distribution is reduced to the well-known skew-normal (SN) distribution introduced by Azzalini [2]. Moreover, it converges to the half-normal distribution when λ 1 .
The SGN has been previously used in the literature. In this sense, Sever et al. [3] used this model in the context of discriminant analysis; Arnold et al. [4] considered the bivariate case; later Gómez et al. [5] examined the skew-curved-normal distribution, which is a subfamily of the SGN distribution. The singularity of the Fisher’s information matrix was examined by Arellano-Valle et al. [6]; in that paper it was concluded that for the SGN model with location and scale parameters, the Fisher information matrix is singular for the particular case when the normality is restored ( λ 1 = 0 ).
Arrué et al. [7], considered bias reduction of the ML estimate of the shape parameter λ 1 in the SGN distribution when λ 2 = 1 . This submodel was named modified skew-normal (MSN). They also showed that Fisher’s information matrix of the MSN model is non-singular when the shape parameter is null. However, the issue of the divergence of the ML estimator for the shape parameter behaves in a similar way as in the SN model (see Sartori [8]). Hence, they applied the method introduced by Firth [9] to reduce the bias of this estimator for the shape parameter.
The MSN model is a convenient alternative to the SN model because in addition to regulating the asymmetry by means of a single parameter, it allows us to apply regular asymptotic theory to study the behavior of the MLE around normality. However, these two models do not have enough flexibility to adjust data from asymmetric distributions with heavy tails. With the purpose of producing robust inferences under these situations, we consider in this work a distribution that has more flexibility than the MSN distribution. Specifically, we consider the modified skew-t-normal (MStN) distribution, which extends the MSN distribution (this is a limiting case when the degrees of freedom tend to infinite) and consequently it is a more flexible model than the MSN distribution. In addition, it is a very useful model to describe abnormal data. On the other hand, the problem with the estimation of the shape parameter persists in this new distribution. For that reason, the main goal of this work is to examine bias reduction by using Firth’s method in the MStN distribution by assuming that the degrees of freedom are known. It will be seen that this leads us to similar results to those obtained in the MSN distribution.
The structure of the paper is as follows. In Section 2 the MStN is considered and some of its main properties are examined. In Section 3 the likelihood-based inference and the singularity of the Fisher’s information matrix of this distribution are studied. Next, in Section 4, the bias reduction methodology introduced by Firth [9] is described. This approach was used by Sartori [8] for the skew-normal and skew-t distributions and by Arrué et al. [7] for the MSN distribution. Later, this method is applied, in particular, to the shape parameter of the MStN distribution. It is shown that modified ML estimation is always finite. Furthermore, for the case of location, scale, and shape parameters, the methodology is applied by combining with the ML estimates of the location and shape parameters. Two applications of this methodology to real datasets are considered in Section 5. Finally, the conclusions are given in Section 6.

2. MStN Distribution

We say that a random variable Z follows a modified skew t-normal distribution, denoted by Z M S t N ( λ , ν ) , if its pdf can be expressed as
f Z ( z ; λ , ν ) = 2 t ν ( z ) Φ λ u ( z ) , z R ,
where u ( z ) = z 1 + z 2 , z R and λ R . Here, t ν and Φ denote the pdf of the Student’s-t distribution with ν degrees of freedom and the cdf of the N ( 0 , 1 ) distribution, respectively. For the particular case that λ = 0 , then the pdf of the MStN distribution given in (2) is equivalent to the pdf of the Student’s-t distribution. On the other hand, non-zero values of λ directly affect the symmetry and kurtosis of the model. In particular, when λ the model converges to the half-t distribution. Additionally, when ν , MStN approaches to the MSN distribution. Figure 1 shows the behavior of the pdf of the MStN distribution for different values of λ and ν .
The thick solid line corresponds to the case λ = 0 , i.e., Student’s-t distribution ( ν = 1 left panel and ν = 5 right panel). The two panels illustrate that skewness and kurtosis are modified with values of λ and ν . Furthermore, for the limiting case ν it coincides with the M S N distribution. Finally, the normal density is obtained when λ = 0 . We can also incorporate location and scale parameters, i.e., X = μ + σ Z , with μ R , σ > 0 and Z M S t N ( λ , ν ) . Then, the new model is denoted by X M S t N ( μ , σ , λ , ν ) , with pdf given by
f X ( x ; μ , σ , λ , ν ) = 2 σ t ν x μ σ Φ λ u x μ σ , x R .
The MStN distribution given by (3) provides a suitable parametric model for fitting empirical data from asymmetric and heavy-tailed distributions. An important property of the MStN distribution given by (3) is the existence and non-singularity of the Fisher’s information matrix when λ = 0 . This means that we can use regular asymptotic theory to test if there is symmetry in the underlying distribution of the data.

2.1. Properties

The MStN distribution has a series of interesting formal properties. The following proposition lists some of them in the standardized case with μ = 0 and σ = 1 .
Proposition 1. 
The MStN distribution satisfies the following properties:
1. 
If Z M S t N ( λ , ν ) , then Z M S t N ( λ , ν ) .
2. 
If Z M S t N ( λ , ν ) , then | Z | 2 t ν (with positive support); in particular, if ν , then | Z | H N ( 0 , 1 ) .
3. 
M S t N ( 0 , ν ) = t ν .
4. 
M S t N ( λ , 1 ) = M S C N (Modified Skew-Cauchy-Normal).
5. 
If ν , then M S t N ( λ , ν ) M S N ( λ ) ; in particular, if λ = 0 , then M S t N ( λ , ν ) M S N ( 0 ) = N ( 0 , 1 ) .
6. 
If Z | V = v M S N ( 0 , 1 / v , v λ ) , with conditional pdf
f Z | V = v ( z ; λ ) = 2 v ϕ ( z ) Φ ( λ u ( z ) ) , z R ,
where V G a m m a ( ν / 2 , ν / 2 ) , then Z M S t N ( λ , ν ) .
7. 
If Z | S = s S t N ( s , ν ) , with conditional pdf
f Z | S = s ( z ; ν ) = 2 t ν ( z ) Φ ( s z ) , z R ,
where S N ( λ , 1 ) , then Z M S t N ( λ , ν ) .
8. 
If Z | V = v , S = s S N ( 0 , 1 / v , s / v ) , with conditional pdf
f Z | v = v , S = s ( z ) = 2 v ϕ ( v z ) Φ ( s z ) , z R ,
where V G a m m a ( ν / 2 , ν / 2 ) and S N ( λ , 1 ) are independent, then Z M S t N ( λ , ν ) .
Property 8 shows the genesis of the MStN distribution, indicating that it belongs to the family of scale-shape mixtures of SN distributions defined in Arellano-Valle et al. [10], with a gamma and a normal mixing distributions for the scale V and shape S random variable, respectively. The proof of this property is obtained by integrating the conditional pdf f Z | v = v , S = s ( z ) in ( v , s ) , and then using the following well-known facts: (i) if V G a m m a ( ν / 2 , ν / 2 ) , then 0 v ϕ ( v z ) f V ( v ) d v = t ν ( z ) ; (ii) if S N ( λ , 1 ) , then Φ ( s z ) d s = Φ ( λ u ( z ) ) . Properties 6 and 7 are direct consequences of Property 8 by the removal of one of the mixing variables. Property 6 represents the MStN distribution as a skew-scale mixture of the MSN distribution (Ferreira et al. [11]), while Property 7 represents it as a shape mixture of the MStN distribution. The remaining properties are proved straightforward.

2.2. Moments

For ν > k , the moment of order k of Z M S t N ( λ , ν ) is given by
E ( Z k ) = 2 z k t ν ( z ) Φ ( λ u ( z ) ) d z = 0 2 z k t ν ( z ) Φ ( λ u ( z ) ) d z + 0 2 z k t ν ( z ) Φ ( λ u ( z ) ) d z = 0 2 ( z ) k t ν ( z ) d z 0 2 ( z ) k t ν ( z ) Φ ( λ u ( z ) ) d z + 0 2 z k t ν ( z ) Φ ( λ u ( z ) ) d z = 0 2 z k t ν ( z ) d z for k even , 0 2 z k t ν ( z ) d z + 2 0 2 z k t ν ( z ) Φ ( λ u ( z ) ) d z for k odd .
That is, we have
E ( Z k ) = E ( | Z | k ) for k even , E ( | Z | k ) + 2 E ( | Z | k Φ ( λ u ( | Z | ) ) for k odd ,
where by Property 2, the random variable | Z | has a half- t ν distribution with pdf 2 t ν ( u ) , u > 0 . This means that | Z | has the same distribution of V 1 / 2 | Z 0 | , where V G a m m a ( ν / 2 , ν / 2 ) and is independent of Z 0 N ( 0 , 1 ) and so of | Z 0 | H N ( 0 , 1 ) , the half-normal distribution with pdf 2 ϕ ( u ) , u > 0 . Thus, we find for ν > k and each k 1 that E ( | Z | k ) = E ( V k / 2 ) E ( | Z 0 | k ) , with
E ( V k / 2 ) = ( ν / 2 ) k / 2 Γ ( ( ν k ) / 2 ) Γ ( ν / 2 ) : = ν k ( ν > k ) , E ( | Z 0 | k ) = Γ ( ( k + 1 ) / 2 ) ( 1 / 2 ) k / 2 Γ ( 1 / 2 ) : = c k .
This is proved by the following proposition:
Proposition 2.
If ν > k , with k 1 , then the random variable Z M S t N ( λ , ν ) has moment of order k, μ k = E ( Z k ) , given by
μ k = c k ν k for k even , 2 b k c k ν k for k odd ,
with c k and ν k defined in (4) and b k : = b k ( λ , ν ) = 0 2 x m t ν ( x ) Φ ( λ u ( x ) ) d x .
Note in Proposition 2 that b k ( λ , ν ) = c k ν k b k ( λ , ν ) , b k ( 0 , ν ) = c k ν k / 2 , lim λ b k ( λ , ν ) = c k ν k and lim λ b k ( λ , ν ) = 0 . In particular, when k is odd, the first of these relationships implies that μ k ( λ ) = μ k ( λ ) . For k even, μ k is constant on λ and hence it becomes a even function of λ .

2.3. Skewness and Kurtosis

Assuming ν > 3 , the skewness coefficient can be computed by using the results in Proposition 2 and the expression given by
β 1 = μ 3 3 μ 2 μ 1 + 2 μ 1 3 ( μ 2 μ 1 2 ) 3 / 2 .
Since μ k ( λ ) = μ k ( λ ) when k is odd, then β 1 is an odd function of λ . This can also be observed in the left panel of Figure 2. The range of its values can be determined in terms of its minimum and maximum for each value of ν , as can be observed in Table 1. These values are obtained from the expression
lim λ ± β 1 = ± c 3 ν 3 3 c 2 ν 2 + 2 c 1 2 ν 1 2 ( c 2 ν 2 c 1 2 ν 1 2 ) 3 / 2 .
Similarly, for ν > 4 the kurtosis coefficient can be obtained from Proposition 2 and the expression given by
β 2 = μ 4 4 μ 3 μ 1 + 6 μ 2 μ 1 2 3 μ 1 4 ( μ 2 μ 1 2 ) 2 .
In this case, for k even μ k = c k ν k does not depend on λ and hence β 2 is an even function of λ , as is displayed in the right hand side panel of Figure 2. Again, minimum and maximum values allow us to compute the range of this coefficient. These values can be obtained from the expressions
lim λ 0 β 2 = 3 ν 2 ν 4 , lim λ ± β 2 = c 4 ν 4 4 c 3 ν 3 c 1 ν 1 + 6 c 2 ν 2 c 1 2 ν 1 2 3 c 1 4 ν 1 4 ( c 2 ν 2 c 1 2 ν 1 2 ) 2 .
The ranges of values of the skewness and kurtosis coefficients coincide with the range of the respective coefficients of the StN distribution, as displayed in Table 1 for ν = 5 , 7 , 19 .
In Figure 2 it is observed, that the skewness coefficient is an odd function and the width of the interval decreases when ν increases. When ν , we obtain the skewness range of the MSN and SN models, which is given by ( 0.995 , 0.995 ) . On the other hand, the kurtosis coefficient is an even function. It is observed that the lower end and upper end of the interval, together with the width of the interval, decreases with the value of ν . Once again when ν , the kurtosis range of the models MSN and SN is obtained, whose interval corresponds to ( 3 , 3.869 ) .

3. Inference

In this section we use the M S t N ( μ , σ , λ , ν ) in order to obtain a robust inference on the parameters ( μ , σ , λ ) . In this sense, the degrees of freedom parameter ν , with 0 < ν < , will be considered as initially known, and in a second stage from a grid of values for ν we will select that value with the best likelihood and AIC.
Given a random sample ( x 1 , , x n ) R n from the random variable X M S t N ( μ , σ , λ , ν ) , the log-likelihood function for parameter vector θ = ( μ , σ , λ ) R × ( 0 , ) × R is
l ( θ ) = n log ( c ν ) n log ( σ ) ν + 1 2 i = 1 n log 1 + z i 2 ν + i = 1 n log ( 2 Φ ( λ u ( z i ) ) ) ,
where c ν = log Γ ( ( ν + 1 ) / 2 ) log Γ ( ν / 2 ) 1 2 log ( π ν ) and z i = ( x i μ ) / σ .
The associated score functions are
l ( θ ) μ = 1 σ i = 1 n λ 1 + z i 2 3 / 2 ζ ( λ u ( z i ) ) + ν + 1 ν z i 1 + z i 2 ν 1 , l ( θ ) σ = 1 σ i = 1 n 1 λ z i 1 + z i 2 3 / 2 ζ ( λ u ( z i ) ) + ν + 1 ν z i 2 1 + z i 2 ν 1 , l ( θ ) λ = i = 1 n u ( z i ) ζ ( λ u ( z i ) ) ,
where ζ ( x ) = ϕ ( x ) / Φ ( x ) . From these equations, ML estimates of ( μ , σ , λ ) must be obtained numerically.
The MStN Fisher’s information matrix computed for n = 1 as i ( θ ) = E ( ( l ( θ ) / θ ) 2 ) has entries given by
i μ μ = 1 σ 2 λ 2 η 03 + ν + 2 ν + 3 , i μ σ = 1 σ 2 λ ρ 05 2 λ ρ 25 λ 3 ρ 27 λ 2 η 13 2 ν + 1 ν δ 2 , i μ λ = 1 σ ρ 03 λ 2 ρ 25 λ η 12 , i σ σ = 1 σ 2 λ 2 η 23 + 2 ν ν + 3 , i σ λ = λ σ η 22 , i λ λ = η 21 ,
where ρ k l = E Z k ( 1 + Z 2 ) l / 2 ζ ( λ u ( Z ) ) , with ρ k l = 0 for k odd, η k l = E ( Z k ( 1 + Z 2 ) l ζ 2 ( λ u ( Z ) ) ) , and δ k = E Z ( 1 + Z 2 / ν ) k , which must be computed assuming that Z M S t N ( λ , ν ) .
In the symmetric case with λ = 0 , we have that δ k = 0 , and so the Fisher information matrix reduces to
i ( μ , σ , 0 ) = 1 σ 2 ν + 2 ν + 3 0 1 σ 2 π d 1 0 2 σ 2 ν ν + 3 0 1 σ 2 π d 1 0 2 π d 2 .
The expressions for d 1 = E ( ( 1 + Z 0 2 ) 1 ) and d 2 = E ( Z 0 2 ( 1 + Z 0 2 ) 1 ) , with Z 0 t ν , were computed using the Mathematica software package [12] and are displayed in Appendix A. This is a non-singular matrix since
| i ( μ , σ , 0 ) | = 4 d 2 ν ( ν + 2 ) π σ 4 ( ν + 3 ) 2 ( 1 h ( ν ) ) 0 ,
where h ( ν ) = ( ν + 3 ) ( d 1 ) 2 ( ν + 2 ) d 2 is an increasing function of ν and converges to 0.926 as ν . This is shown in Figure 3.
On the order hand, as ν , the matrix i ( μ , σ , λ ) i SMN ( μ , σ , λ ) , where i SMN ( μ , σ , λ ) is the Fisher’s information matrix of the M S N ( μ , σ , λ ) model (see also Arrué et al. [7]).
As the MStN model is a regular parametric model, we have for each ν > 0 that the ML estimator θ ^ of θ is consistent and such that n ( θ ^ θ ) d N 3 ( 0 , ( i ( θ ) ) 1 ) as n . This fact holds also when ν , and hence for the ML estimator of θ obtained from the M S N ( θ ) model.

4. Bias Reduction for the ML Estimates

For each value of ν , the ML estimate of λ obtained from the MStN model overestimates the true value of this parameter. This fact can be seen in Table 2 below.
In addition, the ML estimate of λ could be infinite, with a certain probability, when all observations have the same sign, e.g., if all of them are positive so that min { z 1 , , z n } > 0 . This non-zero probability of divergence when estimating λ , increases when the true values of λ and ν grow, however it quickly declines with the sample size. This can be observed in Figure 4.
For the case of the shape parameter with location and scale parameters, the overestimation of these parameters only occurs for λ . This can be noted in Table 3.
As the bias of the ML estimates of μ and σ is virtually zero, it is sensible to apply Firth’s method (see Ref. [9]), to reduce the bias of the ML estimate of λ to O ( n 1 ) (see Cox and Snell [13]). Therefore, by doing this, we obtain a new ML bias-corrected estimate of λ , i.e., λ ^ * unbiased to O ( n 2 ) . Since Z M S t N ( λ , ν ) then it is verified that Z M S t N ( λ , ν ) , and we only proceed for the case λ > 0 .

4.1. Preliminary Results

Consider a regular parametric model with log-likelihood function l ( θ ) , and let us also suppose that the parameter is a scalar; for instance θ = λ . In addition, let U ( θ ) = l ( θ ) and j ( θ ) = l ( θ ) , where l and l are the first and second derivatives of l, respectively, and consider the following expected quantities:
i ( θ ) = E θ ( j ( θ ) ) , ν θ , θ , θ ( θ ) = E θ ( l ( θ ) 3 ) , ν θ , θ θ ( θ ) = E θ ( l ( θ ) l ( θ ) )
For a random sample of size n, we will consider j ( θ ) of order O p ( n ) and i ( θ ) , ν θ , θ , θ ( θ ) and ν θ , θ θ ( θ ) are of order O ( n ) (see Sartori [8]). In order to obtain the reduced ML estimator (least-bias estimator), denoted by θ ^ * , we must solve the following likelihood equation:
U * ( θ ) = U ( θ ) + M ( θ ) = 0 ,
where M ( θ ) = i ( θ ) b ( θ ) and b ( θ ) = 1 2 i ( θ ) 2 { ν θ , θ , θ + ν θ , θ θ } = O ( n 1 ) (see Sartori [8]).
The quasi-likelihood function associated with (6) is given by
l * ( θ ) = c θ u * ( t ) d t = l ( θ ) l ( c ) + c θ m ( t ) d t ,
where c is an arbitrary real number. This function allows us to numerically find a reduced bias estimate of θ , say, θ ^ * . Moreover, it can be used to compute confidence intervals of θ by means of the likelihood ratio statistics. In fact, since l * ( θ ) is a penalized likelihood with a bound penalty function of order O ( 1 ) , then the log-likelihood ratio statistics based on the expression
W * ( θ ) = 2 { l * ( θ ^ * ) l * ( θ ) } ,
has the usual asymptotic distribution χ 1 2 . It can be used to calculate confidence intervals for θ , since it better captures the skewness of the log-likelihood compared to the normal asymptotic distribution.

4.2. Shape Parameter Case

Consider now the baseline case with μ = 0 , σ = 1 and we take a sample of size n, say z 1 , , z n , from M S t N ( λ , ν ) , with θ = λ being the unknown parameter. In this case, the log-likelihood function is obtained from (5) by letting μ = and σ = 1 . However, since ν is assumed to be known, it becomes proportional to the function l ( λ ) = i = 1 n log ( Φ λ u ( z i ) ) .
The ML estimate is infinite, with a certain probability, e.g., when min { z 1 , , z n } > 0 , since the log-likelihood is an increasing function in λ and goes to zero when λ . The score function and the observed information are given by U ( λ ) = i = 1 n u ( z i ) ζ ( λ u ( z i ) ) and j ( λ ) = λ i = 1 n u ( z i ) 3 ζ ( λ u ( z i ) ) + i = 1 n u ( z i ) 2 ζ 2 ( λ u ( z i ) ) , respectively.
Now, by using the notation a k h ( λ ) = E λ { u ( z ) k ζ h ( λ u ( z ) ) } the modified function M ( λ ) takes the following expression,
M ( λ ) = λ 2 a 42 ( λ ) a 22 ( λ ) .
Remark 1.
Note that a k 1 = 0 when k is odd, and a k 1 0 when both k and h are even.
The left panel of Figure 5 below shows the graphs of the modified M ( λ ) function of the SN, MStN for ν = 1 and ν = 3 and the MSN models. All of them are bound and odd functions for all λ and they tend to zero when λ ± (see Proposition 3). They take the maximum value at M S N ( 1.07 ) = 0.83 , M M S t N ( 2.58 ) = 0.64 with ν = 1 , M M S t N ( 2.75 ) = 0.59 with ν = 3 and M M S N ( 2.96 ) = 0.55 respectively. Furthermore, it is also observed that the larger the value of ν , the more closely the modified function associated with the MStN distribution approaches the modified function of the MSN model. On the right panel of Figure 5, the different graphs of the integrated modified function M are displayed. It is noticeable that this is a decreasing and even function with respect to λ .
In order to guarantee the existence of estimator λ ^ * , the following proposition is needed.
Proposition 3.
Let M ( λ ) be the modification of the score function for the M S t N ( λ , ν ) distribution, then M ( λ ) = O ( λ 1 ) whatever the value of ν > 0 , i.e., the rate of convergence of the tails of M ( λ ) , is of the order λ 1 .
Proof. 
See Appendix A. □

First Simulation Study

We have performed a simulation analysis with 5000 iterations for a random variable Z that follows a MStN distribution, by assuming that μ = 0 , σ = 1 and ν are known, for different sample sizes and different values of λ and ν .
It can be inferred from Table 4 that an overestimation of the parameter λ exists and there are also cases where the estimation is . Obviously, it depends on the sample size, degrees of freedom ( ν ), and the true value of the shape parameter ( λ ). After applying Firth’s method to the shape parameter λ , we obtain a new estimate λ ^ * , which is finite and always exists. This is consistent with Proposition 3. The bias reduction of λ ^ * is quite good, taking into account that this method is applied whenever the estimate λ ^ is finite and/or infinite. In addition, there exists an underestimation of λ when its value is large and the sample size is small. The empirical coverage is very close to the nominal value ( 95 % ) , although it is slightly lower when the sample size is small. This is due to the fact that the coverage is affected by the percentage of λ ^ , which approaches infinity.

4.3. Location, Scale, and Shape Case

Similar to the previous case (shape parameter), the ML estimate of λ obtained from the log-likelihood function for ( μ , σ , λ ) given by (5), could be infinite, with a certain probability, when the random sample satisfies min { x 1 , , x n } > μ ^ , where μ ^ is the respective ML estimate of μ . Since the bias of the ML estimates of μ and σ is virtually zero, then it seems reasonable to apply the bias reduction method to only the shape parameter λ .
Let l P ( λ ) = l ( μ ^ λ , σ ^ λ , λ ) be the profile log-likelihood for λ , where μ ^ λ and σ ^ λ are the ML estimates for a known value of λ . We also define the profile of the modified log-likelihood equation as follows
U P * ( λ ) = U P ( λ ) + M ( λ ) = 0 ,
where
U P ( λ ) = l P ( λ ) λ = i = 1 n u z i μ ^ λ σ ^ λ ζ λ u z i μ ^ λ σ ^ λ ,
is the profile of the score function, and M is the modified function given in (8). We also use the profile of the quasi-log-likelihood associated with (9), given by
l P * ( λ ) = c λ U P * ( t ) d t = l P ( λ ) l P ( c ) + c λ M ( t ) d t ,
where c is an arbitrary real number. As M is bound, then the likelihood ratio statistics
W P * ( λ ) = 2 { l P * ( λ * ) l P * ( λ ) } ,
has the usual asymptotic distribution χ 1 2 . This is useful to calculate the confidence interval for λ .

Second Simulation Study

Again, we have performed a simulation analysis with 5000 iterations for a random variable Z M S t N ( μ , σ , λ , ν ) , by assuming that μ = 0 , σ = 1 are unknown and ν is known, for different sample sizes and different values of λ and ν .
In a similar way as in the case of the scalar parameter, it is observed in Table 5, that there exists an overestimation of the parameter λ . Moreover, there are cases where these estimates approach (with a higher percentage than in the previous case). Nevertheless, ML estimates of the location parameter μ and scale parameter σ are quite good since they always exist, they are finite, and bias is close to zero for both cases. For that reason, we only apply Firth’s method to the parameter λ . The new estimate, λ ^ * , always exists and it is finite. Besides, the bias is reduced for both cases, i.e., when λ ^ is finite and/or infinite. Similarly to the previous case (without location and scale parameter), there also exists an underestimation of the parameter λ for large values of λ and small sample size, however the size of the underestimation is of lower magnitude. The empirical coverage is very close to the nominal value ( 95 % ) , although it is slightly lower when the sample size is small and the value of λ is large. This is due to the fact that the coverage is affected by the percentage of λ ^ , which approaches infinity.

5. Applications

In this section we present two applications to real datasets; the first database is available as Supplementary Materials and second at http://Lib.stat.cmu.edu/datasets/Plasma_Retinol (accessed on 2 January 2023).

5.1. First Application

We consider a dataset that deals with the nickel concentration from 86 soil samples analyzed by the Mining Department of Universidad de Atacama, Chile. Table 6, shows the descriptive statistics of this dataset including the sample skewness coefficient ( b 1 ) and the sample kurtosis coefficient ( b 2 ),
Now, an exploration of the ML estimates for the MStN distribution, assuming different known values for the parameter ν , is carried out to examine the behavior of the log-likelihood function. Table 7 illustrates the performance of the latter function. It is observed that the maximum value of this function ( m a x ) occurs when ν = 3 .
Table 8, shows the ML estimates of the parameters for the SN, MSN, and MStN models, respectively. Standard errors (shown in brackets) were obtained by inverting the Fisher’s information matrix for each model. In addition, two measures of model selection, the maximum of the log-likelihood function, and Akaike’s Information Criterion (AIC), are displayed in Table 8. It can be concluded that the model introduced in this paper provides a good fit to data as compared to other competing models.
Figure 6 exhibits the histogram of the nickel concentration dataset. Furthermore, we have superimposed the densities of the M S N ( μ ^ , σ ^ , λ ^ ) (dotted line) and M S t N ( μ ^ , σ ^ , λ ^ , 3 ) (solid line) models.
QQ-plots of MStN and MSN models and cdf of the empirical distribution (ogive) and MSN and MStN distributions are shown in Figure 7 and Figure 8 respectively, obtained from the parameter estimates given above. Both tables confirm the good fit to data of the model presented in this paper.
Table 9 shows the ML estimates μ ^ , σ ^ , λ ^ and the modified ML estimates λ ^ * .
The value of the modified ML estimates λ ^ * is lower than the ML estimates λ ^ . In addition, by construction, it has a lower bias.
Table 10 displays the confidence interval of parameter λ for three different confidence levels. It is observed that the confidence intervals computed using the modified ML estimate of λ ^ * , I C , are more accurate than the interval based on the ML estimate of λ ^ , I C * , since they are narrower.

5.2. Second Application

We present a second application of the proposed model, to a dataset related with Betaplasma. The data were obtained from a study of patients ( n = 315 ) subjected to Betaplasma ingestion in their diet with the object of measuring plasma concentrations of betacarotene (ng/ml). The human body converts betacarotene into vitamin A, which then acts as an antioxidant, preventing oxidation damage to cells.
Table 11, presents a summary of the descriptive statistics, reflecting high dispersion, asymmetry, and a marked kurtosis value.
We studied the ML estimates for the MStN distribution, assuming different known values for parameter ν . This enabled us to analyze the behavior of the log-likelihood function. Table 12, illustrates the performance of the log-likelihood function. The maximum value of this function ( m a x ) occurs when ν = 2 .
Table 13, shows the ML estimates of the parameters for the SN, MSN, and MStN models, respectively, and the standard errors (in parentheses). The table also shows two model selection measures, the maximum log-likelihood function, and the Akaike information criterion (AIC). We may conclude that the model presented in this paper provides a better fit with the data than the SN and MSN models.
Figure 9, shows the histogram of the plasma concentration of betacarotene. It also presents the densities of the M S N ( μ ^ , σ ^ , λ ^ ) model (dotted line) and M S t N ( μ ^ , σ ^ , λ ^ , 2 ) model (continuous line).
Bias reduction was applied to parameter λ of the MStN model, but the same estimated value was obtained as for the ML. This result is not surprising, since when the sample size ( n = 315 ) is relatively large the estimates tend to be more precise.
The left panel of Figure 9 visually shows the good fit of the MStN model with the plasma concentration of the betacarotene data. The right panel presents a close-up of the right tail of the distribution, showing how the proposed model is better at capturing the extreme values than the MSN model.
The QQ plots of the MStN and MSN models, and the CDF of the empirical distributions (ogive) of the MSN and MStN models are shown in Figure 10 and Figure 11, respectively, obtained from the parameter estimates given above. These graphs show the good fit of the proposed model for datasets with high extreme values.

6. Concluding Remarks

In this work, a modified maximum likelihood estimator is proposed to solve the issue of overestimation for the shape parameter λ in the MStN model. This problem is solved by finding a new maximum likelihood estimator ( λ ^ * ) by using Firth’s method. Although the estimates of λ can be finite or infinite, the bias of this new estimator is lower. Moreover, the existence of λ ^ * is proved. A simulation study was carried out for the shape parameter case and for the case where location, scale, and shape parameters were considered. The conclusions for this simulation analysis are given below:
  • For both cases, the bias of the new ML estimator λ ^ * are reduced after applying Firth’s method. For both cases, this methodology satisfactorily reduces the bias of the new estimator even in situations where the estimates of λ are infinite;
  • For the second case, the method is not applied to parameters μ y σ , since the bias is very close to zero;
  • For the first case, the empirical coverage is very close to the nominal value ( 95 % ) , and is slightly lower when the sample size is small; this is due to the higher percentage of infinite values for λ ^ . For the second case, the empirical coverage is lower for situations where λ is large and n is small;
  • For the first case, there exists an underestimation of the parameter λ , when the parameter takes large values and the sample size is small. For the second case, similar results are observed but the degree of the underestimation is of lower magnitude.
Furthermore, the profile of quasi-likelihood associated with the profile of the modified likelihood equation is considered; its objective is to obtain confidence intervals that better capture the skewness of the log-likelihood of this model. The non-singularity of the Fisher’s information matrix associated with the MStN distribution when the shape parameter takes the value zero is also verified. Thus, it allows us to perform asymptotic inference. Two real datasets were used to apply the methodology presented in this paper. Our findings revealed that better results were obtained with the MStN model than with the SN and MSN models.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math11153287/s1. Supplementary file: The database of the first application.

Author Contributions

Conceptualization, J.A. and H.W.G.; methodology, R.B.A.-V. and H.W.G.; software, J.A. and E.C.-O.; validation, R.B.A.-V., O.V. and H.W.G.; formal analysis, R.B.A.-V., E.C.-O. and H.W.G.; investigation, J.A.; writing—original draft preparation, J.A. and R.B.A.-V.; writing—review and editing, R.B.A.-V., E.C.-O. and O.V.; funding acquisition, O.V. and H.W.G. All authors have read and agreed to the published version of the manuscript.

Funding

The research of J. Arrué and H.W. Gómez was supported by SEMILLERO UA-20233, which is an internal fund of the Universidad de Antofagasta, Chile.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The first data is available as Supplementary Materials and second data at http://Lib.stat.cmu.edu/datasets/Plasma_Retinol (accessed on 2 January 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Coefficient d 1 and d 2 :
d 1 = 2 π ν Γ ν 2 π Γ ν + 2 2 2 F 1 1 2 , 3 2 , 2 ν 2 , ν 2 ν ν / 2 Γ 2 ν 2 Γ ν + 1 2 Γ ν + 3 2 2 F 1 ν + 1 2 , ν + 3 2 , ν + 2 2 , ν ,
and
d 2 = ν 2 ν + 1 ( 1 ν ) ν + 3 2 ν Γ ν 2 2 ν ( 1 ν ) ν + 1 2 ν Γ ν 2 2 ( ν 3 ) 2 F 1 1 2 , 1 , 4 ν 2 , ν 2 ( ν 1 ) 2 F 1 1 2 , 1 , 4 ν 2 , ν 2 ( ν 1 ) ν ν + 1 2 Γ ν 2 Γ ( ν ) .
Observed information matrix:
The observed information matrix associated with (5), for a random sample of size n derived from the MStN distribution has entries j θ i θ j = 2 l ( θ ) θ i θ j , i , j = 1 , 2 , 3 , given by
j μ μ = n σ 2 3 λ ρ ¯ 15 + λ 3 ρ ¯ 17 + λ 2 η ¯ 03 ν + 1 ν δ ¯ 02 , j μ σ = n σ 2 λ ρ ¯ 05 2 λ ρ ¯ 25 λ 3 ρ ¯ 27 λ 2 η ¯ 13 2 ν + 1 ν δ ¯ 12 , j μ λ = n σ ρ 03 ¯ + λ 2 ρ ¯ 25 + λ η ¯ 12 , j σ σ = n σ 2 1 + λ ρ ¯ 13 + λ ρ ¯ 15 2 λ ρ ¯ 35 λ 3 ρ ¯ 37 λ 2 η ¯ 23 2 ν + 1 ν δ ¯ 22 ν + 1 ν δ ¯ 21 , j σ λ = n σ ρ ¯ 13 + λ 2 ρ ¯ 35 + λ η ¯ 22 , j λ λ = n ( λ ρ ¯ 33 + η ¯ 21 ) ,
where η ¯ k m = 1 n i = 1 n z i k ζ i 2 ( 1 + z i 2 ) m , ρ ¯ k m = 1 n i = 1 n z i k ζ ( λ u ( z i ) ) ( 1 + z i 2 ) m 2 , and δ ¯ k m = 1 n i = 1 n z i k ( 1 + z i 2 / ν ) m .
Proof of Proposition 3: 
Following Sartori [8], we will first show that a 22 ( λ ) is symmetric and decreasing with respect to λ . In fact, note from the definition of a 22 ( λ ) that
a 22 ( λ ) = 2 0 u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) Φ ( λ u ( z ) d z + 2 0 u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) 1 Φ ( λ u ( z ) ) d z = 2 0 u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) Φ ( λ u ( z ) ) [ 1 Φ ( λ u ( z ) ) ] d z = a 22 ( λ ) .
Then, a 22 ( λ ) is symmetric with respect to λ . Now, we analyze the sign of the derivative of a 22 ( λ ) with respect to λ > 0 ; we have that
a 22 ( λ ) λ = 2 0 u 3 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) Φ 2 ( λ u ( z ) ) [ 1 Φ ( λ u ( z ) ) ] 2 H ( λ u ( z ) ) d z ,
where H ( s ) = 2 s Φ ( s ) [ 1 Φ ( s ) ] ϕ ( s ) [ 1 2 Φ ( s ) ] . But s > 0 , s Φ ( s ) < ϕ ( s ) (see Sartori [8]), then H ( s ) > 0 . Therefore, a 22 ( λ ) is a decreasing function of λ .
Due to the symmetry of a 22 ( λ ) , we are only interested in analyzing the convergence of the right tail, by assuming that λ > 1 . In order to complete the proof, it is convenient to consider the following facts: It is verified z > 0 that
(i)
For λ > 1 , 1 λ u ( λ z ) < u ( z ) < u ( λ z ) < z < λ z .
(ii)
Φ ( z ) and u ( z ) are increasing functions of z, whereas t ν ( z ) is a decreasing one.
(iii)
1 1 Φ ( λ u ( z ) ) < 1 1 Φ ( λ z ) < 1 1 Φ ( 1 ) = c for 0 < z < 1 / λ and λ > 0 with c constant.
(iv)
ϕ ( z ) 1 Φ ( z ) < z 2 + 1 z .
(v)
1 < z < λ z 2 < λ u ( z / λ ) ϕ ( λ u ( z / λ ) ) < ϕ z 2 .
(vi)
1 < λ < z λ 2 < λ u ( z / λ ) ϕ ( λ u ( z / λ ) ) < ϕ λ 2 .
Now, we decompose a 22 ( λ ) as shown below, we have that
a 22 ( λ ) = A ( λ ) + B ( λ ) = 2 0 u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) Φ ( λ u ( z ) d z + 2 0 u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) 1 Φ ( λ u ( z ) ) d z .
Analysis of A ( λ ) :
z > 0 and λ > 1 , we have that 1 < 1 / Φ ( λ u ( z ) ) < 2 ; then A * ( λ ) < A ( λ ) < 2 A * ( λ ) where
A * ( λ ) = 2 0 u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) d z = I 3 + I 4 ,
with
I 3 = 2 0 1 / λ u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) d z , and I 4 = 2 1 / λ u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) d z .
Now we have the following inequality:
O ( λ 3 ) = I 1 < A * ( λ ) < 2 ( I 3 + I 4 ) < O ( λ 3 ) ,
where the bounds I 1 , I 3 y I 4 are obtained by using (i), (ii) and making the change of variable r = λ z , i.e.,
I 1 = 2 λ 2 0 u 2 ( λ z ) t ν ( λ z ) ϕ 2 ( λ z ) d z = 1 λ 3 I 2 = O ( λ 3 ) , I 2 = 2 0 u 2 ( r ) t ν ( r ) ϕ 2 ( r ) d r < , I 3 < 2 0 1 / λ u 2 ( z ) t ν ( u ( λ z ) ) ϕ 2 ( u ( λ z ) ) d z < 2 0 1 / λ z 2 t ν ( u ( λ z ) ) ϕ 2 ( u ( λ z ) ) d z = 1 λ 3 I 6 = O ( λ 3 ) , I 6 = 2 0 1 r 2 t ν ( u ( r ) ) ϕ 2 ( u ( r ) ) d r < , I 4 = 2 λ 1 u 2 ( r / λ ) t ν ( r / λ ) ϕ 2 ( λ u ( r / λ ) ) d r = I 7 2 , 2 + I 8 2 , 2 ,
where I 7 2 , 2 y I 8 2 , 2 are obtained from the expression below by using (v) and (vi), i.e.,
I 7 k , l = 2 λ 1 λ u k ( r / λ ) t ν ( r / λ ) ϕ l ( λ u ( r / λ ) ) d r < 2 λ k + 1 1 λ r k t ν ( r / λ ) ϕ l ( r / 2 ) d r = I 12 I 12 = 1 λ k + 1 I 9 = O ( λ ( k + 1 ) ) , I 9 = 2 t ν ( 0 ) 0 r k ϕ l ( r / 2 ) d r < , I 8 k , l = 2 λ λ u k ( r / λ ) t ν ( r / λ ) ϕ l ( λ u ( r / λ ) ) d r < 2 λ ϕ l ( λ / 2 ) λ u k ( r / λ ) t ν ( r / λ ) d r = I 13 I 13 = ϕ l ( λ / 2 ) I 10 = O ( e λ 2 ) , I 10 = 2 1 u ( v ) k t ν ( v ) d v < .
Next,
I 4 = I 7 2 , 2 + I 8 2 , 2 < O ( λ 3 ) + O ( e λ 2 ) < O ( max { λ 3 , e λ 2 } ) < O ( λ 3 ) .
Finally, as A * ( λ ) = Ω ( λ 3 ) , we then have that A ( λ ) = Ω ( λ 3 ) . Analysis B ( λ ) :
In this case, we have that A * ( λ ) = O ( λ 3 ) < B ( λ ) , since 1 Φ ( λ u ( z ) ) < 1 . Now, we find an upper bound for B ( λ ) ,
B ( λ ) = B 1 ( λ ) + B 2 ( λ ) = 2 0 1 / λ u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) 1 Φ ( λ u ( z ) ) d z + 2 1 / λ u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) 1 Φ ( λ u ( z ) ) d z .
Next, by using (iii), we have that
B 1 ( λ ) < 2 c 0 1 / λ u 2 ( z ) t ν ( z ) ϕ 2 ( λ u ( z ) ) d z = c I 3 < O ( λ 3 ) .
Now, for B 2 ( λ ) , we use (iv) and make the change of variable r = λ z ,
B 2 ( λ ) < 2 1 / λ u 2 ( z ) t ν ( z ) ϕ ( λ u ( z ) ) ( λ u ( z ) ) 2 + 1 ( λ u ( z ) ) d z = I 11 , I 11 = 2 λ 1 / λ u 3 ( z ) t ν ( z ) ϕ ( λ u ( z ) ) d z + 2 λ 1 / λ u ( z ) t ν ( z ) ϕ ( λ u ( z ) ) d z = 2 1 u 3 ( r / λ ) t ν ( r / λ ) ϕ ( λ u ( r / λ ) ) d r + 2 λ 2 1 u ( r / λ ) t ν ( r / λ ) ϕ ( λ u ( r / λ ) ) d r = λ ( I 7 3 , 1 + I 8 3 , 1 ) + 1 λ 2 { λ ( I 7 1 , 1 + I 8 1 , 1 ) } .
Therefore,
I 11 = λ ( I 7 3 , 1 + I 8 3 , 1 ) + 1 λ 2 { λ ( I 7 1 , 1 + I 8 1 , 1 ) } < λ ( O ( λ 4 ) + O ( e λ 2 ) ) + 1 λ ( O ( λ 2 ) + O ( e λ 2 ) ) < max { λ 3 , λ e λ 2 , λ 1 e λ 2 } < O ( λ 3 ) .
Then, it is concluded that B ( λ ) = Ω ( λ 3 ) , and therefore, a 22 ( λ ) = Ω ( λ 3 ) . In a similar way, the monotonicity, symmetry and the order of convergence of a 42 ( λ ) with respect to λ can be proved. In that case, a 42 ( λ ) = Ω ( λ 5 ) and then M ( λ ) = Ω ( λ 1 ) .□

References

  1. Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. A New Class of Skew-Normal Distributions. Commun. Stat. Theory Methods 2004, 33, 1465–1480. [Google Scholar] [CrossRef]
  2. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  3. Sever, M.; Lajovic, J.; Rajer, B. Robustness of the Fishers discriminant function to skew-curved normal distribution. Metodoloskizvezki 2005, 2, 231–242. [Google Scholar]
  4. Arnold, B.C.; Castillo, E.; Sarabia, J.M. Distributions with Generalized Skewed Conditionals and Mixtures of such Distributions. Commun. Stat. Theory Methods 2007, 36, 1493–1504. [Google Scholar] [CrossRef]
  5. Gómez, H.W.; Castro, L.M.; Salinas, H.S.; Bolfarine, H. Properties and Inference on the Skew-curved-symmetric Familiy of Distributions. Commun. Stat. Theory Methods 2010, 39, 884–898. [Google Scholar] [CrossRef]
  6. Arellano-Valle, R.B.; Gómez, H.W.; Salinas, H.S. A note on the Fisher information matrix for the skew-generalized-normal model. SORT 2013, 37, 19–28. [Google Scholar]
  7. Arrué, J.; Arellano-Valle, R.B.; Gómez, H.W. Bias reduction of maximum likelihood estimates for a modified skew normal distribution. J. Stat. Comput. Simul. 2016, 86, 2967–2984. [Google Scholar] [CrossRef]
  8. Sartori, N. Bias prevention of maximum likelihood estimates for scalar skew normal and skew t distributions. J. Stat. Plan. Inference 2006, 136, 4259–4275. [Google Scholar] [CrossRef]
  9. Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 1993, 80, 27–38, Erratum in Biometrika 1993, 82, 667. [Google Scholar] [CrossRef]
  10. Arellano-Valle, R.B.; Ferreira, C.S.; Genton, M.G. Scale and shape mixtures of multivariate skew-normal distributions. J. Multivar. Anal. 2018, 166, 98–110. [Google Scholar] [CrossRef] [Green Version]
  11. Ferreira, C.S.; Bolfarine, H.; Lachos, V.H. Skew scale mixtures of normal distributions: Properties and estimation. Stat. Methodol. 2011, 8, 154–171. [Google Scholar] [CrossRef]
  12. Wolfram Research, Inc. Mathematica, Version 10.0; Wolfram Research, Inc.: Champaign, IL, USA, 2014. [Google Scholar]
  13. Cox, D.R.; Snell, E.J. A general definition of residuals. J. R. Stat. Soc. B Stat. Methodol. 1968, 30, 248–275. [Google Scholar] [CrossRef]
Figure 1. Different plots of the pdf of the MStN distribution for different values of λ when ν = 1 (left panel) and ν = 5 (right panel).
Figure 1. Different plots of the pdf of the MStN distribution for different values of λ when ν = 1 (left panel) and ν = 5 (right panel).
Mathematics 11 03287 g001
Figure 2. Skewness and kurtosis coefficients for the MStN model.
Figure 2. Skewness and kurtosis coefficients for the MStN model.
Mathematics 11 03287 g002
Figure 3. Graph of the function h ( ν ) .
Figure 3. Graph of the function h ( ν ) .
Mathematics 11 03287 g003
Figure 4. Probability of divergence for the ML estimate of λ for the MStN model when ν = 1 (left panel) and ν = 5 (right panel).
Figure 4. Probability of divergence for the ML estimate of λ for the MStN model when ν = 1 (left panel) and ν = 5 (right panel).
Mathematics 11 03287 g004
Figure 5. Modified function (left panel) and integrated modified function (right panel) for SN (dashed line), MStN with ν = 1 (solid line) and ν = 3 (thick solid line) and MSN (dotted line).
Figure 5. Modified function (left panel) and integrated modified function (right panel) for SN (dashed line), MStN with ν = 1 (solid line) and ν = 3 (thick solid line) and MSN (dotted line).
Mathematics 11 03287 g005
Figure 6. Histogram of the nickel concentration data together with the pdf of MSN model (dotted) and MStN (solid).
Figure 6. Histogram of the nickel concentration data together with the pdf of MSN model (dotted) and MStN (solid).
Mathematics 11 03287 g006
Figure 7. QQ plots for MStN and MSN distributions.
Figure 7. QQ plots for MStN and MSN distributions.
Mathematics 11 03287 g007
Figure 8. Empirical cdf (ogive) versus theoretical cdf for the MStN and MSN models.
Figure 8. Empirical cdf (ogive) versus theoretical cdf for the MStN and MSN models.
Mathematics 11 03287 g008
Figure 9. Histogram of the plasma concentration of betacarotene data with the pdf of the MSN (dotted) and MStN (continuous) models.
Figure 9. Histogram of the plasma concentration of betacarotene data with the pdf of the MSN (dotted) and MStN (continuous) models.
Mathematics 11 03287 g009
Figure 10. QQ plots for MStN and MSN distributions.
Figure 10. QQ plots for MStN and MSN distributions.
Mathematics 11 03287 g010
Figure 11. Empirical cdf (ogive) versus theoretical cdf for the MStN and MSN models.
Figure 11. Empirical cdf (ogive) versus theoretical cdf for the MStN and MSN models.
Mathematics 11 03287 g011
Table 1. Skewness and kurtosis ranges for different values of ν .
Table 1. Skewness and kurtosis ranges for different values of ν .
ν Skewness RangeKurtosis Range
5(−2.550, 2.550)(9.00, 23.109)
7(−1.798, 1.798)(5.000, 9.461)
9(−1.539, 1.539)(4.200, 7.054)
11(−1.407, 1.407)(3.857, 6.082)
13(−1.326, 1.326)(3.667, 5.561)
15(−1.272, 1.272)(3.545, 5.237)
17(−1.233, 1.233)(3.462, 5.017)
19(−1.204, 1.204)(3.400, 4.857)
(−0.995, 0.995)(3.000, 3.869)
Table 2. ML estimate of λ and empirical (theoretical) percentage of cases when it exists: this results in 5000 iterations from the M S t N ( λ , ν ) model.
Table 2. ML estimate of λ and empirical (theoretical) percentage of cases when it exists: this results in 5000 iterations from the M S t N ( λ , ν ) model.
n = 20 n = 50 n = 100
λ ν λ ^ a % ( λ ^ < ) λ ^ a % ( λ ^ < ) λ ^ a % ( λ ^ < )
536.9970.68 (71.04)7.0095.10 (95.49)5.8299.74 (99.80)
57.1771.48 (72.28)7.0095.36 (95.95)5.8699.90 (99.84)
107.0674.52 (73.24)6.9796.24 (96.29)5.8399.84 (99.86)
10311.5245.62 (45.09)14.5877.32 (77.66)13.7395.18 (95.01)
512.3146.44 (46.15)15.0079.08 (78.72)13.6496.16 (95.47)
1012.6747.26 (47.00)14.3880.40 (79.55)13.8895.54 (95.82)
a Calculated when λ ^ < .
Table 3. Bias of μ ^ , σ ^ and λ ^ , and percentages of cases when λ ^ exists: results in 5000 iterations from the M S t N ( 0 , 1 , λ , ν ) distribution.
Table 3. Bias of μ ^ , σ ^ and λ ^ , and percentages of cases when λ ^ exists: results in 5000 iterations from the M S t N ( 0 , 1 , λ , ν ) distribution.
n λ ν Bias ( μ )Bias ( σ ) λ a % ( λ ^ < )
50530.0030.0066.86083.78
10053−0.0050.0106.79796.98
20053−0.0010.0035.66599.88
501030.016−0.01011.05261.80
1001030.0040.00113.49985.24
2001030.0000.00213.21197.60
50550.0030.0046.84085.54
10055−0.0010.0006.63797.52
20055−0.0010.0035.66699.96
501050.016−0.01111.25461.96
1001050.004−0.00313.64586.08
2001050.001−0.00112.84598.00
505100.008−0.0066.79887.10
1005100.0000.0016.52598.22
2005100.0000.0025.64199.94
5010100.017−0.01511.40465.28
10010100.002−0.00113.62187.80
2001010−0.0010.00013.20098.20
a Calculated when λ ^ < .
Table 4. Bias of λ ^ and λ ^ * , empirical coverage of 0.95CI based on W * ( λ ) and empirical (theoretical) percentage of cases when λ ^ exists: results in 5000 iterations from the M S t N ( λ , ν ) distribution.
Table 4. Bias of λ ^ and λ ^ * , empirical coverage of 0.95CI based on W * ( λ ) and empirical (theoretical) percentage of cases when λ ^ exists: results in 5000 iterations from the M S t N ( λ , ν ) distribution.
n λ ν Bias( λ ^ ) aBias( λ ^ * ) % W ( λ * ) % ( λ ^ < )
20531.867−1.5830.9471.64 (71.04)
50531.754−0.2980.9595.06 (95.49)
100530.788−0.0300.9599.82 (99.80)
201031.991−6.0340.9044.78 (45.09)
501034.299−2.8660.9476.80 (77.66)
1001033.856−0.6940.9494.84 (95.01)
20552.148−1.5130.9472.62 (72.28)
50551.802−0.2930.9596.48 (95.95)
100550.815−0.0040.9599.80 (99.84)
201052.197−5.9490.9046.46 (46.15)
501054.116−2.7510.9479.38 (78.72)
1001053.862−0.6260.9595.38 (95.47)
205102.177−1.4790.9472.82 (73.24)
505102.103−0.2360.9696.64 (96.29)
1005100.7760.0180.9599.90 (99.86)
2010102.274−5.8880.9147.42 (47.00)
5010104.169−2.6260.9479.18 (79.55)
10010104.338−0.6000.9595.88 (95.82)
a Calculated when λ ^ < .
Table 5. Bias of μ ^ , σ ^ , λ ^ and λ ^ * , empirical coverage of 0.95 CI based on W * ( λ ) and empirical percentage of cases when λ ^ exists: results in 5000 iterations from the M S t N ( 0 , 1 , λ , ν ) model.
Table 5. Bias of μ ^ , σ ^ , λ ^ and λ ^ * , empirical coverage of 0.95 CI based on W * ( λ ) and empirical percentage of cases when λ ^ exists: results in 5000 iterations from the M S t N ( 0 , 1 , λ , ν ) model.
n λ ν Bias ( μ ^ )Bias ( σ ^ )Bias ( λ ^ )Bias ( λ ^ * ) % W ( λ * ) % ( λ ^ < )
50530.0040.0051.889−0.8980.9384.24
10053−0.0020.0051.617−0.3060.9497.42
20053−0.0010.0040.712−0.1060.9599.84
501030.017−0.0081.097−3.9710.8761.86
1001030.004−0.0033.301−1.6990.9185.40
2001030.0010.0003.188−0.5340.9497.96
50550.0040.0021.828−0.8550.9486.70
10055−0.0020.0051.755−0.2550.9497.58
20055−0.0010.0020.628−0.0980.9599.78
501050.016−0.0131.185−3.8420.8763.74
1001050.005−0.0023.736−1.5080.9286.72
2001050.0000.0003.053−0.4010.9497.78
505100.006−0.0031.832−0.8130.9286.84
1005100.0000.0021.523−0.2450.9598.00
2005100.0000.0000.578−0.1080.9499.94
5010100.014−0.0131.530−3.6890.8864.18
10010100.004−0.0023.622−1.3110.9286.40
20010100.002−0.0022.700−0.4500.9398.06
Table 6. Descriptive statistics of the nickel concentration dataset.
Table 6. Descriptive statistics of the nickel concentration dataset.
Datan t ¯ s b 1 b 2
Nickel86 21.337 16.639 2.355 11.191
Table 7. ML estimates of the parameters of the MStN model when ν is known together with the maximum of the log-likelihood function.
Table 7. ML estimates of the parameters of the MStN model when ν is known together with the maximum of the log-likelihood function.
μ ^ σ ^ λ ^ ν max
13.0926.2840.9541−343.057
8.67611.1822.2352−338.775
7.08313.7672.9943−338.260
6.33515.3453.4924−338.483
5.85816.4583.8755−338.864
21.43911.410−0.5186−349.399
Table 8. Parameter estimates, standard errors, m a x and AIC for SN, MSN, and MStN distributions.
Table 8. Parameter estimates, standard errors, m a x and AIC for SN, MSN, and MStN distributions.
MLESNMSNMStN
μ ^ 2.625 (1.136)2.571 (1.260)7.083 (1.402)
σ ^ 24.968 (1.913)25.027 (2.153)13.767 (1.838)
λ ^ 10.261 (4.751)10.619 (5.239)2.994 (0.789)
ν --3
m a x −344.762−344.769−338.260
AIC693.524693.538682.520
Table 9. ML estimate and modified ML estimate of μ , σ , λ for the MStN distribution.
Table 9. ML estimate and modified ML estimate of μ , σ , λ for the MStN distribution.
μ ^ σ ^ λ ^ λ ^ * l ( μ ^ , σ ^ , λ ^ ) l ( μ ^ , σ ^ , λ ^ * )
7.083 (1.402)13.767 (1.838)2.994 (0.789)-−338.260-
7.083 (1.447)13.767 (1.843)-2.838 (0.731)-−338.305
Table 10. Confidence interval for λ .
Table 10. Confidence interval for λ .
MLE 95 % 98 % 99 %
IC(1.696, 4.292)(1.373, 4.615)(1.158, 4.830)
IC*(1.635, 4.040)(1.336, 4.339)(1.137, 4.539)
Table 11. Descriptive statistics of the plasma concentration of betacarotene.
Table 11. Descriptive statistics of the plasma concentration of betacarotene.
Datan t ¯ s b 1 b 2
Betaplasma315 189.895 182.997 3.530 19.792
Table 12. ML estimates of the parameters of the MStN model when ν and its maximum log-likelihood function are known.
Table 12. ML estimates of the parameters of the MStN model when ν and its maximum log-likelihood function are known.
μ ^ σ ^ λ ^ ν max
64,73774,44625141−1.931,939
52,120107,39239462−1.917,631
45,800127,35250813−1.918,405
42,250140,47259304−1.921,499
39,815150,17766085−1.924,907
Table 13. Parameter estimates, standard errors, m a x , AIC, and BIC for the SN, MSN, and MStN distributions.
Table 13. Parameter estimates, standard errors, m a x , AIC, and BIC for the SN, MSN, and MStN distributions.
MLESNMSNMStN
μ ^ 21.332 (5.180)21.332 (5.170)52.120(5.686)
σ ^ 248.613 (10.510)248.594 (10.505)107.391(8.519)
λ ^ 18.834(6.968)18.954(6.971)3.945(0.728)
ν --2
m a x −1976.317−1976.319−1917.631
AIC3956.6343956.6383839.262
BIC3969.8963969.8963852.520
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Arrué, J.; Arellano-Valle, R.B.; Calderín-Ojeda, E.; Venegas, O.; Gómez, H.W. Likelihood Based Inference and Bias Reduction in the Modified Skew-t-Normal Distribution. Mathematics 2023, 11, 3287. https://doi.org/10.3390/math11153287

AMA Style

Arrué J, Arellano-Valle RB, Calderín-Ojeda E, Venegas O, Gómez HW. Likelihood Based Inference and Bias Reduction in the Modified Skew-t-Normal Distribution. Mathematics. 2023; 11(15):3287. https://doi.org/10.3390/math11153287

Chicago/Turabian Style

Arrué, Jaime, Reinaldo B. Arellano-Valle, Enrique Calderín-Ojeda, Osvaldo Venegas, and Héctor W. Gómez. 2023. "Likelihood Based Inference and Bias Reduction in the Modified Skew-t-Normal Distribution" Mathematics 11, no. 15: 3287. https://doi.org/10.3390/math11153287

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop