Next Article in Journal
A Channel-Sensing-Based Multipath Multihop Cooperative Transmission Mechanism for UE Aggregation in Asymmetric IoE Scenarios
Next Article in Special Issue
On a Randomly Censoring Scheme for Generalized Logistic Distribution with Applications
Previous Article in Journal
Tool-Emitted Sound Signal Decomposition Using Wavelet and Empirical Mode Decomposition Techniques—A Comparison
Previous Article in Special Issue
The Optimal Experimental Design for Exponentiated Frech’et Lifetime Products
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data

Departamento de Estadística y Ciencia de Datos, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1270300, Chile
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(9), 1224; https://doi.org/10.3390/sym16091224
Submission received: 27 June 2024 / Revised: 1 September 2024 / Accepted: 8 September 2024 / Published: 18 September 2024

Abstract

:
The skew distribution has the characteristic of appropriately modeling asymmetric unimodal data. However, in practice, there are several cases in which the data present more than one mode. In the literature, it is possible to find a large number of authors who have studied extensions based on the skew distribution to model this type of data. In this article, a new family is introduced, consisting of a multimodal modification to the family of skew distributions. Using the methodology of the weighted version of a function, we perform the product of the density function of a family of skew distributions with a polynomial of degree 4, thus obtaining a more flexible model that allows modeling data sets, whose distribution contains at most three modes. The density function, some properties, moments, skewness coefficients, and kurtosis of this new family are presented. This study focuses on the particular cases of skew-normal and Laplace distributions, although it can be applied to any other distribution. A simulation study was carried out, to study the behavior of the model parameter estimates. Illustrations with real data, referring to medicine and environmental data, show the practical performance of the proposed model in the two particular cases presented.

1. Introduction

Let h and G, respectively, be a probability density function (pdf) symmetric with respect to zero and a cumulative distribution function (cdf), such that the derivative of G is symmetric with respect to zero. Then,
f Y ( y ; λ ) = 2 h ( y ) G ( λ y ) , < y <
is a density function for all λ in the reals, Azzalini [1], where λ is a skewness parameter, denoted by Y S K ( λ ) . In the case that h and G are the pdf and cdf of the standard normal distribution in (1), the resulting distribution is called the skew-normal distribution, represented by the expression f Y ( y ; λ ) = 2 ϕ ( y ) Φ ( λ y ) , denoted by Y S N ( λ ) . Furthermore, when a random variable follows a skew-normal distribution with location parameters μ , scale σ > 0 , and skewness λ , it will be denoted by Y S N ( μ , σ , λ ) .
Although the skew distribution (see [1]) can function appropriately in a wide variety of environments where the data exhibit unimodality, this model does not perform well in the presence of multimodality, that is, when there are multiple modes or peaks in the distribution empirical. The presence of multimodality can be explained by different reasons, including the existence of multiple groups or subpopulations with unique characteristics, or by the existence of latent variables that significantly influence the distribution of the population. In such cases, a mixed distribution is one of the first alternatives considered for modeling; however, its use implies addressing the problem of non-identifiability. Various methods for introducing new flexible probability distributions can be found in the statistical literature. There are many examples that we could mention, but the approaches proposed in Elal-Olivero [2], Gómez et al. [3], Venegas et al. [4], and Bolfarine et al. [5] are especially attractive when trying to propose a new bimodal distribution. The objective of this article was to develop an alternative multimodal family for the skew-normal distribution, for which we propose a weighted version, Fisher [6] and Rao [7], of the skew distribution that can present asymmetric shapes with up to three modes. We provide evidence that new family performance, being flexible in both asymmetry and ways involving bimodality, can overcome some important distributions in the literature.
Gómez-Déniz et al. [8,9] present two extensions of the skew-normal family, to model bimodality and multimodality.
The first is defined by
f Y ( y ; λ , a ) = g Y ( y ) [ G X ( λ y + a ) + G X ( λ y a ) ] ,
where g Y is a density function that is symmetric about zero, and where G X is a cdf of a distribution also symmetric about zero, y , a , λ .
The second is defined as follows: if f is a symmetric pdf around 0, defined by f ( w α ( x ) ) , with w α ( x ) = x α x , where α 0 and F is the corresponding cdf, then we have the following family of bimodal asymmetric distributions:
g ( x ; α , λ ) = 2 F λ x f w α x , x 0 f ( 0 ) , x = 0 .
These models present more flexibility than the skew family of distributions, since for different values of the parameters they provide a distribution that can present unimodality or bimodality. On the other hand, Reyes et al. in [10,11] present bimodal distributions for the exponential case and Birnbaum Saunders, respectively. In this paper, we present a modification to the family of skew distributions given in Equation (1), which also includes the Azzalini family of skew distributions (see Azzalini [1]) as a particular case. The methodology used is based on the multiplication of Azzalini’s proposal by a polynomial of degree 4 and by adding a new parameter to the family. This new family is shown as an alternative to the families presented by Gómez-Déniz et al. [8,9].
This article is organized as follows: In Section 2, an expression is obtained for the pdf of the new family along with its most relevant properties: moments, kurtosis coefficient, and log-likehood function. In Section 3, the particular case of the normal distribution is studied. In addition, a simulation study is included, in which the behavior of the estimators of the proposed family for this particular case is evaluated. Two applications to real data are shown, one related to medical data and the other to environmental data. In Section 4, the particular case of the Laplace distribution is studied. In addition, a simulation study is included, in which the behavior of the estimators of the proposed family for this particular case is evaluated. An application to environmental data is shown. Finally, Section 5 presents the discussion.

2. Modified Generalized Skew Distribution

2.1. Density Function

Let Y be a random variable, let h be a density function symmetric with respect to zero, and let G be a cumulative distribution function whose density is also symmetric with respect to zero. We will say that Y is a distributed Modified Generalized Skew (MGS) with parameters α that control the number of modes and λ the skewness, denoted by Y M G S ( α , λ ) .
Theorem 1.
Let Y M G S ( α , λ ) ; then, the density function of Y is given by
f Y ( y , α , λ ) = 2 1 + α ρ 4 1 + α y 4 h ( y ) G ( λ y ) ,
where y , λ , α 0 , and ρ 4 is the moment of order 4 of a random variable X with a skew distribution of parameter λ.
Proof. 
f Y ( y ) d y = C 0 1 + α y 4 2 h ( y ) G ( λ y ) d y = C 0 2 h ( y ) G ( λ y ) d y + α y 4 2 h ( y ) G ( λ y ) d y = C 0 [ 1 + α E ( X 4 ) ] = C 0 [ 1 + α ρ 4 ] = 1 ,
where C 0 = 1 1 + α ρ 4 and ρ 4 is the moment of order 4 of a random variable X with a skew distribution of parameter λ . □

2.2. Important Results

In this section, we present some results of the MGS distribution.
Let Y M G S ( α , λ ) , λ , and α 0 ; then:
1.
f Y ( y ; 0 , 0 ) = h ( y ) .
2.
f Y ( y ; α , 0 ) = 1 1 + α ρ 4 1 + α y 4 h ( y ) .
3.
f Y ( y ; 0 , λ ) = 2 h ( y ) G ( λ y ) .
Item 1 indicates that if both parameters are zero then the family of symmetric density functions is recovered. Item 2 shows that when λ = 0 a family of uni or bimodal symmetric distributions is obtained. Finally, Item 3 indicates that if α = 0 then the family of skew distributions is obtained.
The above results are illustrated in the following diagram:
M G S ( α , λ ) α = 0 , h = N o r m a l S N ( 0 , λ ) λ = 0 N ( 0 , 1 ) α = 0 , h = L o g i s t i c S L O G ( 0 , λ ) λ = 0 L O G ( 0 , 1 ) α = 0 , h = L a p l a c e S L P ( 0 , λ ) λ = 0 L P ( 0 , 1 )

2.3. Moments

The following statement shows the moments for the M G S distribution. These depend on the moments of the skew distribution.
Proposition 1.
If Y M G S ( α , λ ) then for r = 1 , 2 , . . . we have
μ r = E Y r = 1 1 + α ρ 4 ρ r + α ρ r + 4 ,
where y , λ , α 0 , and ρ r are the moments of order r of a random variable X with a skew distribution of parameter λ.
Proof. 
μ r = E Y r = 1 1 + α ρ 4 2 y r ( 1 + α y 4 ) h ( y ) G ( λ y ) d y = 1 1 + α ρ 4 2 ( y r + α y r + 4 ) h ( y ) G ( λ y ) d y = 1 1 + α ρ 4 ρ r + α ρ r + 4 .
The first four moments of Y are given in the following corollary:
Corollary 1.
If Y M G S ( α , λ ) then
μ 1 = 1 1 + α ρ 4 ρ 1 + α ρ 5 μ 2 = 1 1 + α ρ 4 ρ 2 + α ρ 6 μ 3 = 1 1 + α ρ 4 ρ 3 + α ρ 7 μ 4 = 1 1 + α ρ 4 ρ 4 + α ρ 8 ,
Proof. 
Replacing these expressions in Proposition 1, for r = 1 , 2 , 3 , 4 the results are obtained. □
Corollary 2.
If Y M G S ( α , λ ) then
E ( Y r ; α , λ ) = E Y r ; α , λ , i f r o d d , E ( Y r ; α , λ ) = 1 1 + α ρ 4 2 y r ( 1 + α y 4 ) f ( y ) d y E Y r ; α , λ , i f r e v e n .
Proof. 
E Y r ; α , λ = 1 1 + α ρ 4 2 y r ( 1 + α y 4 ) h ( y ) G ( λ y ) d y = 1 1 + α ρ 4 2 y r ( 1 + α y 4 ) h ( y ) [ 1 G ( λ y ) d y ] = 1 1 + α ρ 4 2 y r ( 1 + α y 4 ) h ( y ) E Y r ; α , λ .
For r even and odd we obtain what is required. □
Corollary 3.
If Y M G S ( α , λ ) then
β 1 ( α , λ ) = β 1 ( α , λ ) α 2 ( α , λ ) = α 2 ( α , λ ) .
Proof. 
Using Corollary 2 and substituting into the standardized skewness coefficients ( β 1 ) and kurtosis ( α 2 ) given by
β 1 = μ 3 3 μ 2 μ 1 + 2 μ 1 3 ( μ 2 μ 1 2 ) 3 / 2 , α 2 = μ 4 4 μ 1 μ 3 + 6 μ 1 2 μ 2 3 μ 1 4 ( μ 2 μ 1 2 ) 2 ,
respectively, the result is obtained. □

2.4. M G S Distribution with Location and Scale Parameters

The family of distributions M G S ( α , λ ) can be extended by means of a linear transformation, introducing location and scale parameters, adding more flexibility to the model proposed in (4).
Let Y M G S ( α , λ ) ; then, Z = μ + σ Y follows a Modified Generalized Skew model with location parameters μ and scale σ denoted by Z M G S ( μ , σ , α , λ ) , and its density function is given by
f Z ( z ; μ , σ , α , λ ) = 2 σ ( 1 + α ρ 4 ) 1 + α z μ σ 4 h z μ σ G λ z μ σ ,
where z , λ , α 0 , and ρ 4 is the moment of order 4 of a random variable X with a skew distribution of parameter λ .
The moments of the distribution of Z M G S ( μ , σ , α , λ ) are given by
Proposition 2.
Let Z M G S ( μ , σ , α , λ ) ; then,
E ( Z r ) = E [ ( μ + σ Y ) r ] = i = 0 r r i μ r i σ i μ i = 1 1 + α ρ 4 i = 0 r r i μ r i σ i ρ i + α ρ i + 4 ,
ρ r are the moments of order r of a random variable X with a skew distribution of parameter λ.
Proof. 
By developing the Newton binomial and placing the moments given in Proposition 1 into E ( Z r ) the result is obtained. □

2.5. Log-Likelihood Function

Let z 1 , z 2 , , z n be a random sample of a variable Z, such that Z M G S ( θ ) with θ = ( μ , σ , α , λ ) ; then, the log-likelihood function is
l ( θ ; z ) = n log ( σ ) n log 1 + α ρ 4 + i = 1 n log 1 + α z i μ σ 4 i = 1 n log h z i μ σ i = 1 n log G λ z i μ σ .
Partially deriving the log-likelihood function with respect to the parameters and solving the system of equations in numerical form, we obtain the maximum likelihood estimators of the parameters μ , σ , α , and λ .

3. Normal Distribution Case

Let us consider the particular case in Equation (5) when h = ϕ and G = Φ . If a random variable follows a Modified Generalized Skew Normal (MGSN) distribution then we will denote it by Z M G S N ( μ , σ , α , λ ) , and its pdf is given by
f Z ( z ; μ , σ , α , λ ) = 2 σ 1 + 3 α 1 + α z μ σ 4 ϕ z μ σ Φ λ ( z μ ) σ ,
where z , μ , σ > 0 , λ , and α 0 .
Figure 1 shows the density function of the proposed model MGSN for the parameters μ = 0 , σ = 1 , and different values of α and λ compared to the Gómez-Déniz [8] model for the normal case, called the Generalized Skew Normal (GSN) distribution. In this representation, the great flexibility of the new distribution can be seen to model unimodal, bimodal, and trimodal data with only two parameters, while the GSN model is only unimodal using the same number of parameters:
Proposition 3.
If Y M G S N ( μ , σ , α , λ ) then its density function presents at most three modes.
Proof. 
Without losing generality, we consider Y M G S N ( 0 , 1 , α , λ ) and the parameter λ only affects the asymmetry; we can assume λ = 0 in the density given in (6); then,
f Y ( y ) = 1 1 + 3 α 1 + α y 4 ϕ ( y ) .
Differentiating and equating to zero, we have
f Y ( y ) y = 1 + α y 4 ( y ϕ ( y ) ) + 4 α y 3 ϕ ( y ) = 0 ,
f Y ( y ) y = y 1 + α y 4 + 4 α y 3 = 0 ,
resulting in a polynomial of degree 5, that is, it has at most three maximums. For the normal case, λ = 0 , and values of α 0.25 , the density is unimodal. Otherwise, it is trimodal when α is finite or bimodal when α . □
In Figure 2, it can be observed that the graphical representation of the MGSN model when λ = 0 for values of α [ 0 , 1 / 4 ) is unimodal, α 1 / 4 is trimodal, and when α it is bimodal.

3.1. Moments

The moments for the M G S N ( 0 , 1 , α , λ ) distribution are obtained by substituting into Corollary 1 the moments of the skew-normal distribution given by Henze [12]:
μ 1 = 2 π λ 1 + 3 α λ 2 + 1 5 2 λ 4 8 α + 1 + 2 λ 2 10 α + 1 + 1 + 15 α μ 2 = 1 + 15 α 1 + 3 α μ 3 = 2 π λ 1 + 3 α λ 2 + 1 7 2 2 λ 6 24 α + 1 + 7 λ 4 24 α + 1 + 2 λ 2 105 α + 4 + 3 + 105 α μ 4 = 3 + 108 α 1 + 3 α .
Figure 3 shows the graphs of the skewness and kurtosis coefficients of the MGSN distribution for μ = 0 , σ = 1 , and different values of α and λ . In the left panel, it can be seen that for a fixed value of α the skewness coefficient is an odd function with respect to λ . As an example, given α = 8 , the value of the skewness coefficient for λ = 2 is 0.1234 and for λ = 2 it is 0.1234 . In the right panel, we can see that given a fixed value of α the kurtosis coefficient is an even function with respect to λ . For example, given α = 8 , the value of the kurtosis coefficient for λ = 2 is 3.8962 and for λ = 2 it is 3.8962 .
Figure 4 shows, in the right panel, the profile of the asymmetric coefficient for different values of α . It can be seen that for α = 0 the profile coincides with the profile of the skew coefficient of the skew-normal distribution. Furthermore, through exploratory analysis we can conclude that if α and λ 0.7923602 then β 1 converges to ± 1.700501 . Similarly, we have that for α = 0 the profile of the kurtosis, shown in the right panel, coincides with the profile of the kurtosis coefficient of the skew-normal distribution. Also, through exploratory analysis, we can conclude that if α and λ 1.023191 then the value of α 2 converges to 7.878286 , and if α and λ 0 then the value of α 2 converges to 1.4 .
The skewness and kurtosis values for fixed values of α and λ , obtained from Table 1, show numerically that the skewness and kurtosis coefficients are even and odd functions with respect to λ , respectively.

3.2. Estimate

Let z 1 , z 2 , , z n be a random sample of a variable Z, such that Z M G S N ( θ ) with θ = ( μ , σ , α , λ ) ; then, the log-likelihood function is
l ( θ ; z ˜ ) = i = 1 n log 1 + α z i μ σ 4 n log ( 1 + 3 α ) n log ( σ ) i = 1 n z i μ 2 σ 2 + i = 1 n log Φ λ z i μ σ .
After deriving the log-likelihood function, the normal equations are given by
( θ ; z ˜ ) μ = 1 σ i = 1 n 4 α z i μ σ 3 1 + α z i μ σ 4 + 1 σ i = 1 n z i μ σ λ σ i = 1 n ϕ λ z i μ σ Φ λ z i μ σ = 0 , ( θ ; z ˜ ) σ = 1 σ i = 1 n 4 α z i μ σ 4 1 + α z i μ σ 4 n σ + 1 σ z i μ σ 2 i = 1 n λ σ i = 1 n z i μ σ ϕ λ z i μ σ Φ λ z i μ σ = 0 , ( θ ; z ˜ ) α = i = 1 n z i μ σ 4 1 + α z i μ σ 4 3 n 1 + 3 α = 0 , ( θ ; z ˜ ) λ = i = 1 n z i μ σ ϕ λ z i μ σ Φ λ z i μ σ = 0 .
Maximum Likelihood Estimators (MLE) are obtained, maximizing normal equations. These equations do not allow an analytical solution, so it is necessary to use iterative methods.

3.3. Simulation Study

There are many programs that provide built-in random number generators, but there are probability distributions that are not covered by such software. In the case of the MGSN distribution, we use the acceptance–rejection method to generate random numbers of the distribution M G S N ( μ , σ , α , λ ) with the pdf defined in (6), according to the algorithm below. The results of a sequence of n random numbers are stored within a matrix that we call the n-vector. Since the MGSN distribution has non-finite support, we use a constant l 1 > 0 to limit the generated MGSN values. Furthermore, we consider another constant l 2 > 0 corresponding to the maximum value of the pdf MGSN, which must be evaluated in the true parameters.

3.3.1. Algorithm

To start the algorithm, we need to define the parameters μ , σ , α , and λ of the MGSN distribution, as follows:
  • n: the length of the n-vector.
  • Y: a random variable with M G S N ( μ , σ , α , λ ) distribution.
  • f Y ( y ) : the MGSN pdf with y > 0 .
  • l 1 : a lower limit for the MGSN numbers to be generated with l 1 > 0 .
  • l 2 : the maximum value of f Y with l 2 > 0 .
  • U 1 : a random variable with a uniform distribution in ( l 1 , l 1 ) , U ( l 1 , l 1 ) , in short.
  • U 2 : a random variable with a U ( 0 , l 2 ) distribution.
Acceptance–rejection algorithm to generate numbers from the M G S N ( μ , σ , α , λ ) distribution:
  • Begin Input: n, μ , σ , α , λ
  • Output: n-vector,
  • Set l 2 = m a x y > 0 { f Y ( y ) } ;
  • Generate a value u 1 from U 1 U ( l 1 , l 1 ) ;
  • Obtain a value u 2 from U 2 U ( 0 , l 2 ) ;
  • Set y = u 1 from Y M G S N ( μ , σ , α , λ ) if u 2 f ( u 1 ) , append y to n-vector; otherwise, go back to step 3;
  • Repeat steps 3–5 until the length of n-vector is equal to n;
  • end
Computational simulations were performed in the R programming language, using the “optim” function quasi-Newton method “BFGS” from the “stats” package. We used a computer with the following characteristics: (i) OS: Windows 10 Pro 64-bit; (ii) RAM: 8 GB; and (iii) Processor: Intel(R) Core(TM) i7-8550U CPU at 1.99 GigaHertz. The algorithm above was run 2000 times with n = 50, 100, 200, and 500; the average processing time was 0.04565 s. Below, we show the EMVs obtained from the M G S N ( μ , σ , α , λ ) model for different parameter values and random sample sizes, using the acceptance–rejection algorithm.

3.3.2. Simulation Results

Table 2 presents the results of the simulation study, illustrating the behavior of the MLE for 2000 samples of sizes n = 50, 100, 200, and 500 of a population with distribution M G S N ( μ , σ , α , λ ) . Also, it can be seen that the estimates of the parameters are quite close to the true value, and that the standard deviations and average lengths of the intervals are small. These results show the expected asymptotic behavior. On the other hand, the empirical hedges are very close in all cases to the nominal value of 95 % confidence.

3.4. Applications for the Normal Case

In this section, we show two real data applications for the MGSN model given in (6) and compare their results with the proposed models given in [8,9] for the normal and skew-normal cases (GSN) and (GSN2), respectively, given in (2) and (3), considering location and scale parameters, as follows:
f Y ( y ; μ , σ , λ , α ) = ϕ z ( y μ ) σ Φ λ ( y μ ) σ + α + Φ λ ( y μ ) σ α
and
g ( y ; μ , σ , α , λ ) = 2 Φ λ ( y μ ) σ ϕ w α ( y μ ) σ , y μ ϕ 0 , y = μ

3.4.1. Application 1

The data used in Application 1 correspond to the age and frequency of cancer called Kaposis sarcoma. This is a type of cancer that can form masses in the skin, lymph, nodes, or other organs without distinguishing the subtypes. The data were collected from the website of the Office for National Statistics (ONS, Health Statistics section), and they can be seen in Table A1 in the Appendix (see Appendix A). It can be seen that there is a greater incidence in individuals aged around 25 years, as well as for those aged about 60 years. The records were taken during the years 1995 to 2016 and correspond to different regions of the UK.
Table 3 shows descriptive summary measures of data related to Kaposis sarcoma. Table 4 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSN2, GSN, and MGSN models. Using the Akaike Information Criterion (AIC) [13] and the Akaike Consistent Information Criterion (CAIC) [14], it can be seen that the MGSN model presents a better fit, since its value is lower. Figure 5 shows the histogram and plot of the GSN2, GSN, and MGSN models for the Kaposis sarcoma data set. Through the graphical representation, it can be seen that the MGSN model apparently fits the data better.

3.4.2. Application 2

The second data set corresponds to the duration of the Old Faithful geyser eruption (see Appendix, Table A2) in Yellowstone National Park, WY, USA [15]. Table 5 shows the descriptive summary measures of the data related to the duration of the Old Faithful Geyser eruption. Table 6 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSN2, GSN, and MGSN models. Using the AIC [13] and CAIC [14] criteria, it can be seen that the MGSN model presents a better fit, because its values are smaller. Figure 6 shows the histogram and graphical representation of the GSN2, GSN, and MGSN models for the eruption time data set. Through the graphical representation, it can be seen that the MGSN model apparently fits the data better.

4. Laplace Distribution Case

Let us consider the particular case in Equation (5) when h and G are, respectively, the cumulative and density function of the Laplace distribution. If a random variable follows a Modified Generalized Skew Laplace (MGSLP) distribution, we will denote it by Z M G S L P ( μ , σ , α , λ ) , and its pdf is given by
f Z ( z ; μ , σ , α , λ ) = 2 σ 1 + 3 α 1 + α z μ σ 4 h z μ σ G λ ( z μ ) σ ,
where z , μ , σ > 0 , α 0 , and λ .

4.1. Simulation Study for the Case of the Laplace Distribution

Table 7 presents the results of the simulation study, illustrating the behavior of the MLE for 2000 samples of sizes n = 50, 100, 200, and 500 of a population with distribution M G S L P ( μ , σ , α , λ ) . Also, it can be seen that the estimates of the parameters are quite close to the true value, and that the standard deviations and average lengths of the intervals are small. These results show the expected asymptotic behavior. On the other hand, the empirical hedges are very close in all cases to the nominal value of 95 % confidence.

4.2. Application for the Laplace Distribution Case

In this section, we show one real-data application for the MGSLP model given in (7) and compare the results with the models proposed in [8,9] for the Laplace and skew-Laplace cases (GSLP) and (GSLP2), respectively, given in (2) and (3), as follows:
f Y ( y ; μ , σ , λ , α ) = f z ( y μ ) σ F λ ( y μ ) σ + α + F λ ( y μ ) σ α
and
g ( y ; μ , σ , α , λ ) = 2 F λ ( y μ ) σ f w α ( y μ ) σ , y μ f 0 , y = μ
where f and F correspond to the density and cumulative distribution of the Laplace distribution, respectively.
For the data corresponding to the duration of the Old Faithful geyser eruption (see Appendix A, Table A2) in Yellowstone National Park, Wyoming, USA [15], Table 8 shows the values of the maximum likelihood estimates and their corresponding standard deviations for the GSLP2, GSLP, and MGSLP models. Using the AIC [13] and CAIC [14] criteria, it can be seen that the MGSLP model presents a better fit because its values are smaller. Figure 7 shows the histogram and graphical representation of the GSLP2, GSLP, and MGSLP models for the eruption time data set. Through the graphical representation, it can be seen that the MGSLP model apparently best fits the eruption time data set.

5. Discussion

We have proposed a new family based on a weighted version of the skew distribution, which has a parameter, α , that allows modeling data sets that present one, two, or three modes. That is, we have a family of models that are more flexible than the distributions proposed by Gómez-Déniz et al. [8,9], considering that these have the same number of parameters. Its density function, moments, and some properties were studied; it should be noted that the mathematical treatment is less complex than other distributions given in the current literature. In particular, when the parameter α takes the value zero the new family recovers the family of skew distributions. Two particular cases of the new model were studied, one for the normal distribution and the other for the Laplace distribution. A simulation algorithm was developed, using the acceptance–rejection method, to obtain random samples of different sizes from the proposed model, for the two particular cases. Subsequently, 2000 iterations were carried out for each of these samples, obtaining the estimates through the maximum likelihood method, using the “optim” function of the R software, for different values of μ , σ , α , and λ . This study allowed us to observe the good asymptotic behavior of the parameter estimates. Two applications were carried out with real data, one related to medicine and the other to the environment, where it was empirically shown that the proposed family fits better than the families presented by Gómez-Déniz et al. [8,9]. This new model is a potential contribution for professionals who work in data analysis and/or users of statistics.

Author Contributions

Data curation, J.R.; formal analysis, J.R., M.A.R., P.L.C., and J.A.; investigation, J.R., M.A.R., and P.L.C.; methodology, J.R., M.A.R., P.L.C., and J.A.; writing—original draft, J.R., M.A.R., P.L.C., and J.A.; writing—review and editing, M.A.R., P.L.C., and J.A.; Funding Acquisition, J.R., M.A.R., and J.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the University of Antofagasta for the research of J. Reyes, M. Rojas, and J. Arrué through the Proyecto Semillero UA 2022.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Data corresponding to Kaposis sarcoma.
Table A1. Data corresponding to Kaposis sarcoma.
AgeNumber
11
589
10342
15718
202352
253593
303243
352533
402015
451747
501562
551662
601801
651915
701855
751611
801203
85642
90247
Table A2. Data corresponding to eruption time.
Table A2. Data corresponding to eruption time.
7974654951497879
5452738386574664
7448828153777775
6280564779688447
8559798481814986
5590715260818363
8880628682737185
8558768177508082
5184607576854957
8558785959747582
5473768980556467
8483837949777674
7864755996835354
4753828153839483
8382705077515573
5259658577787673
6275735965845088
8490888781468280
5254765371835471
7980806970557583
5154487781817856
4783865693577979
7871608853767878
6964908189847884
7477504545777058
8381788286817983
5559635558877043
7684729078775460
7848844566518675
7982758376785081
7360515663609046
7792828988825490
6678624652915446
8078888293537774

References

  1. Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
  2. Elal-Olivero, D. Alpha-skew-normal distribution. Proyecciones 2010, 29, 224–240. [Google Scholar] [CrossRef]
  3. Gómez, H.W.; Elal-Olivero, D.; Salinas, H.S.; Bolfarine, H. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics 2011, 22, 50–62. [Google Scholar] [CrossRef]
  4. Venegas, O.; Salinas, H.S.; Gallardo, D.I.; Bolfarine, H.; Gómez, H.W. Bimodality based on the generalized skew-normal distribution. J. Stat. Comput. Simul. 2018, 88, 156–181. [Google Scholar] [CrossRef]
  5. Bolfarine, H.; Martínez-Flórez, G.; Salinas, H.S. Bimodal symmetric-asymmetric power-normal families. Commun. Stat. Theory Methods 2018, 47, 259–276. [Google Scholar] [CrossRef]
  6. Fisher, R.A. The effect of methods of ascertainment upon the estimation of frequencies. Ann. Eugen. 1934, 6, 13–25. [Google Scholar] [CrossRef]
  7. Rao, C.R. On discrete distributions arising out of methods of ascertainment. Sankhyā Indian J. Stat. Ser. A 1965, 27, 311–324. [Google Scholar]
  8. Gómez-Déniz, E.; Arnold, B.C.; Sarabia, J.M.; Gómez, H.W. Properties and Applications of a New Family of Skew Distributions. Mathematics 2021, 9, 87. [Google Scholar] [CrossRef]
  9. Gómez-Déniz, E.; Calderín-Ojeda, E.; Sarabia, J.M. Bimodal and Multimodal Extensions of the Normal and Skew Normal Distribution s. Stat. J. 2023. accepted and available on the internet. [Google Scholar]
  10. Reyes, J.; Gómez-Déniz, E.; Gómez, H.W.; Calderín-Ojeda, E. A Bimodal Extension of the Exponential Distribution with Applications in Risk Theory. Symmetry 2021, 13, 679. [Google Scholar] [CrossRef]
  11. Reyes, J.; Arrué, J.; Leiva, V.; Martin-Barreiro, C. A New Birnbaum- Saunders Distribution and Its Mathematical Features Applied to Bimodal Real-World Data from Environment and Medicine. Mathematics 2021, 9, 1891. [Google Scholar] [CrossRef]
  12. Henze, N. A probabilistic representation of the Skew-Normal distribution. Scand. J. Stat. 1986, 4, 271–275. [Google Scholar]
  13. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
  14. Bozdogan, H. The general theory and its analytical extension. Psychometrika 1974, 52, 345–370. [Google Scholar] [CrossRef]
  15. Owen, D. Tables for computing bivariate normal probabilities. Ann. Math. Stat. 1956, 27, 1075–1090. [Google Scholar] [CrossRef]
Figure 1. Plot of MGSN pdf (solid line) and GSN pdf (dashed line) for different values of α and λ .
Figure 1. Plot of MGSN pdf (solid line) and GSN pdf (dashed line) for different values of α and λ .
Symmetry 16 01224 g001
Figure 2. Plot of the MGSN model for the case λ = 0 and different values of α .
Figure 2. Plot of the MGSN model for the case λ = 0 and different values of α .
Symmetry 16 01224 g002
Figure 3. Plots of the skewness (left) and kurtosis (right) of the MGSN distribution.
Figure 3. Plots of the skewness (left) and kurtosis (right) of the MGSN distribution.
Symmetry 16 01224 g003
Figure 4. Profile of coefficient skewness (left) and kurtosis (right) of the MGSN distribution.
Figure 4. Profile of coefficient skewness (left) and kurtosis (right) of the MGSN distribution.
Symmetry 16 01224 g004
Figure 5. MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the Kaposis sarcoma data.
Figure 5. MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the Kaposis sarcoma data.
Symmetry 16 01224 g005
Figure 6. Histogram and graphical representation of MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the eruption time data.
Figure 6. Histogram and graphical representation of MGSN distribution (solid line), GSN distribution (dashed line), and GSN2 distribution (dotted line) for the eruption time data.
Symmetry 16 01224 g006
Figure 7. Histogram for the eruption time data set and the fit of the graphs for the MGSLP (solid line), GSLP (dashed line), and GSLP2 (dotted line) distributions.
Figure 7. Histogram for the eruption time data set and the fit of the graphs for the MGSLP (solid line), GSLP (dashed line), and GSLP2 (dotted line) distributions.
Symmetry 16 01224 g007
Table 1. Coefficients skewness and kurtosis values of the MGSN model for different values of α and λ .
Table 1. Coefficients skewness and kurtosis values of the MGSN model for different values of α and λ .
Coefficient SkewnessCoefficient Kurtosis
α λ = 3 λ = 2 λ = 0 λ = 2 λ = 3 λ = 3 λ = 2 λ = 0 λ = 2 λ = 3
10.10470.23260−0.2326−0.10472.74602.93951.68752.93952.7460
20.14130.29070−0.2907−0.14133.10793.42531.55153.42533.1079
30.10290.26590−0.2659−0.10293.25363.64921.50283.64923.2536
40.05880.23070−0.2307−0.05883.31333.76251.47783.76253.3133
50.01950.19760−0.1976−0.01953.33673.82471.46263.82473.3367
6−0.01390.16890−0.16890.01393.34363.86091.45243.86093.3436
7−0.04190.14440−0.14440.04193.34253.88281.44503.88283.3425
8−0.06560.12340−0.12340.06563.33763.89621.43953.89623.3376
9−0.08590.10550−0.10550.08593.33103.90461.43513.90463.3310
10−0.10330.08990−0.08990.10333.32363.90981.43163.90983.3236
11−0.11840.07640−0.07640.11843.31603.91291.42883.91293.3160
12−0.13160.06450−0.06450.13163.30853.91461.42643.91463.3085
13−0.14320.05400−0.05400.14323.30143.91531.42443.91533.3014
14−0.15360.04460−0.04460.15363.29463.91541.42273.91543.2946
15−0.16280.03620−0.03620.16283.28823.91511.42123.91513.2882
16−0.17110.02870−0.02870.17113.28213.91451.41993.91453.2821
17−0.17860.02190−0.02190.17863.27653.91371.41873.91373.2765
18−0.18540.01570−0.01570.18543.27113.91281.41773.91283.2711
19−0.19160.01000−0.01000.19163.26623.91181.41673.91183.2662
20−0.19730.00490−0.00490.19733.26153.91071.41593.91073.2615
Table 2. Simulation of 2000 iterations for parameter estimates for the model M G S N ( μ , σ , α , λ ) by the maximum likelihood method.
Table 2. Simulation of 2000 iterations for parameter estimates for the model M G S N ( μ , σ , α , λ ) by the maximum likelihood method.
n μ σ λ α μ ^ sd  ( μ ^ ) Ali  ( μ ^ ) ( μ ^ ) σ ^ sd  ( σ ^ ) Ali  ( σ ^ ) ( σ ^ ) λ ^ sd  ( λ ^ ) Ali  ( λ ^ ) ( λ ^ ) α ^ sd  ( α ^ ) Ali  ( α ^ ) ( α ^ )
5001−0.50.40.00180.47811.874393.551.00140.15380.602894.10−0.55770.46821.835496.300.54710.33961.331493.80
10001−0.50.40.00900.37581.473095.401.00450.12020.471295.30−0.52260.29151.142796.850.46880.18260.715994.30
20001−0.50.40.00790.27961.096195.501.00070.09190.360295.45−0.51530.28001.097798.300.43570.11020.431994.10
50001−0.50.40.01080.16400.642895.801.00350.05400.211895.35−0.50870.10240.401695.900.41320.06190.242794.25
50010.520.00150.21850.856495.501.00090.08660.339495.250.51460.15290.599595.752.44541.29795.087991.50
100010.520.00160.14340.562194.600.99870.05740.225094.500.50430.10020.392894.902.39371.04874.110792.75
200010.520.00000.09910.388394.500.99910.04060.159095.100.50130.06930.271594.952.24050.76633.003893.35
500010.52−0.00180.06110.239494.850.99980.02450.096295.050.50120.04260.167095.852.08690.41621.631694.15
500110.50.09880.53102.081594.400.96940.17750.695695.501.29251.37805.401694.450.67360.57192.241795.80
1000110.50.01770.46791.834294.050.99230.14970.587095.401.23821.16364.561395.950.62860.41461.625394.75
2000110.50.01950.35611.395894.200.99210.11480.450094.501.05970.54702.144196.850.56830.28551.119196.20
5000110.50.00400.23400.917194.750.99790.07590.297494.601.02600.38691.516698.950.52570.14430.565796.05
5012−0.50.40.95020.94663.710694.101.99230.30611.200095.15−0.55420.68022.666497.900.54010.33321.306294.60
10012−0.50.41.01170.76573.001594.752.00500.24560.962994.80−0.52150.28521.118195.950.46920.18140.711194.15
20012−0.50.41.03220.56682.221995.802.01210.17840.699295.05−0.52170.31131.220498.850.43070.11260.441595.60
50012−0.50.41.01520.31641.240195.452.00320.10390.407495.85−0.50610.09810.384495.700.41270.06160.241494.15
50−120.52−0.97400.43481.704494.901.99000.16790.658394.750.51180.14790.579694.702.39711.29885.091591.85
100−120.52−0.99400.28641.122594.251.99540.11490.450295.250.50840.10070.394694.502.38081.09894.307892.35
200−120.52−0.99340.19920.780995.001.99630.07970.312595.050.50450.07070.277295.602.21290.74662.926793.75
500−120.52−0.99650.12540.491495.401.99930.05010.196594.900.50150.04350.170495.352.07100.40021.568794.45
50−1110.5−0.91520.53542.098895.500.96680.18030.706995.201.23611.19484.683693.850.69210.61022.391995.60
100−1110.5−0.97820.47191.849894.950.99210.15220.596695.301.19481.04364.091096.400.62910.43111.690095.55
200−1110.5−0.99170.38011.490293.700.99590.12210.478693.901.08100.60832.384496.900.57250.29171.143396.10
500−1110.5−1.00640.22450.880094.601.00070.07300.286194.901.02030.21720.851294.800.52740.13350.523494.80
In the above, sd corresponds to the standard deviation, Ali corresponds to the average length of the intervals, and C corresponds to the empirical coverage based on a confidence interval of 95 % of the respective EMV of the parameters.
Table 3. Summary statistics for Kaposis sarcoma data set.
Table 3. Summary statistics for Kaposis sarcoma data set.
nMeanVarianceAsymmetryKurtosis
29,13145.396416.3870.3131.936
Table 4. Parameter estimates for GSN2, GSN, and MGSN distributions for Kaposis sarcoma data set.
Table 4. Parameter estimates for GSN2, GSN, and MGSN distributions for Kaposis sarcoma data set.
Parameter EstimatesGSN2 (sd)GSN (sd)MGSN (sd)
μ ^ 37.6241 (0.03552)37.029 (0.1313)20.5880 (0.1293)
σ ^ 21.0537 (0.0808)22.052 (0.1050)18.1833 (0.0674)
λ ^ 0.4912 (0.0085)4.8080 (0.1180)3.9293 (0.0525)
α ^ 0.0754 (0.0017)5.525 (0.1350)0.2488 (0.00412)
AIC256,212.1253,832.6249,300.9
CAIC256,245.2253,869.7249,334.0
Table 5. Summary statistics for the eruption time data set.
Table 5. Summary statistics for the eruption time data set.
nMeanVarianceAsymmetryKurtosis
27270.897184.8240−0.4141.844
Table 6. Parameter estimates for GSN2, GSN, and MGSN distributions for the eruption time data set.
Table 6. Parameter estimates for GSN2, GSN, and MGSN distributions for the eruption time data set.
Parameter EstimatesGSN2 (sd)GSN (sd)MGSN (sd)
μ ^ 65.1850 (0.2520)75.5992 (0.1313)57.5424 (1.5939)
σ ^ 13.088 (0.5570)14.3610 (0.6651)9.2529 (0.4983)
λ ^ 0.6760 (0.1160)−6.2206 (1.9559)1.7219 (0.3005)
α ^ 0.4660 (0.0380)7.5214 (2.4209)1.5183 (0.3818)
AIC2248.852142.432077.92
CAIC2266.742156.952092.34
Table 7. Simulation of 2000 iterations for parameter estimates for the model M G S L P ( μ , σ , α , λ ) by the maximum likelihood method.
Table 7. Simulation of 2000 iterations for parameter estimates for the model M G S L P ( μ , σ , α , λ ) by the maximum likelihood method.
n μ σ λ α μ ^ sd  ( μ ^ ) Ali  ( μ ^ ) ( μ ^ ) σ ^ sd  ( σ ^ ) Ali  ( σ ^ ) ( σ ^ ) λ ^ sd  ( λ ^ ) Ali  ( λ ^ ) ( λ ^ ) α ^ sd  ( α ^ ) Ali  ( α ^ ) ( α ^ )
50110.50.11.00300.43981.724295.451.00720.17730.694899.250.61780.47461.860696.200.11810.07180.281496.40
100110.50.10.99400.31801.246595.701.00580.11100.435399.250.54660.20240.793695.600.10820.04120.161596.40
200110.50.11.00550.25340.993297.501.00690.13820.541899.600.51710.11960.469095.550.10350.02390.093695.15
500110.50.11.01470.27261.068599.251.01690.18650.731399.200.50800.08540.334797.900.10070.01580.061996.86
50210.50.92.10060.52102.042393.450.99290.10540.413295.200.54520.24260.950896.050.86560.50791.991098.80
100210.50.92.04190.37741.479393.650.99480.07650.299794.150.52400.13960.547495.201.17250.83773.283791.60
200210.50.92.00230.27101.062394.750.99980.05510.215895.100.51370.08830.346094.651.19240.83583.276393.56
500210.50.92.00420.15460.606194.850.99880.03330.130495.200.50380.05140.201695.300.99720.36331.423994.46
50011.20.90.22150.54392.132092.350.96270.11360.445393.451.09510.47311.854495.650.80880.51862.032999.20
100011.20.90.09150.43231.694593.850.98540.08640.338794.201.27200.54732.145594.151.04330.72562.844591.65
200011.20.90.03700.31381.230193.900.99390.06240.244894.751.33380.53492.096895.251.11570.78583.080394.40
500011.20.90.01960.19280.755994.850.99650.03840.150394.551.24970.24240.950094.651.00320.44551.746495.30
In the above, sd corresponds to the standard deviation, Ali corresponds to the average length of the intervals, and C corresponds to the empirical coverage, based on a confidence interval of 95 % of the respective EMV of the parameters.
Table 8. Parameter estimates for GSLP2, GSLP, and MGSLP distributions.
Table 8. Parameter estimates for GSLP2, GSLP, and MGSLP distributions.
Parameter EstimatesGSN2 (sd)GSN (sd)MGSN (sd)
μ ^ 101.1921 (1.8598)73.9999 (0.0278)66.9997 (0.0543)
σ ^ 20.3161 (1.3864)11.5685 (0.70151)2.6399 (0.0748)
λ ^ −8.5204 (4.4833)−7.9486 (3.9974)0.0583 (0.0149)
α ^ 1.0879 (0.0132)10.9371 (6.0598)3.6810 (1.7438)
AIC2181.302148.542095.638
CAIC2195.722162.962114.061
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Reyes, J.; Rojas, M.A.; Cortés, P.L.; Arrué, J. A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data. Symmetry 2024, 16, 1224. https://doi.org/10.3390/sym16091224

AMA Style

Reyes J, Rojas MA, Cortés PL, Arrué J. A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data. Symmetry. 2024; 16(9):1224. https://doi.org/10.3390/sym16091224

Chicago/Turabian Style

Reyes, Jimmy, Mario A. Rojas, Pedro L. Cortés, and Jaime Arrué. 2024. "A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data" Symmetry 16, no. 9: 1224. https://doi.org/10.3390/sym16091224

APA Style

Reyes, J., Rojas, M. A., Cortés, P. L., & Arrué, J. (2024). A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data. Symmetry, 16(9), 1224. https://doi.org/10.3390/sym16091224

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop