Abstract
In this study, we introduce a new generalized family of distributions called the Exponentiated Half Logistic-Harris-G (EHL-Harris-G) distribution, which extends the Harris-G distribution. The motivation for introducing this generalized family of distributions lies in its ability to overcome the limitations of previous families, enhance flexibility, improve tail behavior, provide better statistical properties and find applications in several fields. Several statistical properties, including hazard rate function, quantile function, moments, moments of residual life, distribution of the order statistics and Rényi entropy are discussed. Risk measures, such as value at risk, tail value at risk, tail variance and tail variance premium, are also derived and studied. To estimate the parameters of the EHL-Harris-G family of distributions, the following six different estimation approaches are used: maximum likelihood (MLE), least-squares (LS), weighted least-squares (WLS), maximum product spacing (MPS), Cramér–von Mises (CVM), and Anderson–Darling (AD). The Monte Carlo simulation results for EHL-Harris-Weibull (EHL-Harris-W) show that the MLE method allows us to obtain better estimates, followed by WLS and then AD. Finally, we show that the EHL-Harris-W distribution is superior to some other equi-parameter non-nested models in the literature, by fitting it to two real-life data sets from different disciplines.
1. Introduction
As distribution theory develops, it addresses the challenges researchers face, proposing a variety of models for better analyzing and exploring lifetime data sets. Consequently, useful models are necessary to better understand real phenomena in nature. Current trends and practices in constructing new probability models differ greatly from those proposed before 1997. A key objective of developing, extending, or generalizing models, and their classes, is to explain the lifetime phenomenon in a variety of fields, including physics, computer science, insurance, public health, medicine, engineering, biology, industry, communication, life testing, and so on.
Some well-known and fundamental distributions, such as Uniform, Beta, Exponential, Rayleigh, Weibull, and Gamma, do not demonstrate a wide range of flexibility. For example, the exponential distribution can be modeled with a constant hazard function, whereas the Rayleigh distribution can only be modeled with an increasing hazard function. The Weibull distribution, however, is more flexible and can be modeled with increasing, decreasing, or constant hazard functions. However, it cannot be used to model non-monotonic failure rate functions (such as unimodal or bathtub shapes), due to limitations in the Weibull distribution. It is difficult to describe the mathematical properties of the gamma distribution because there is no closed form of the cumulative distribution function. The failure rate behavior of complex phenomena, such as human mortality, reliability, lifetime testing, engineering modeling, electronic sciences, and biological surveys, may be bathtub-shaped, upside-down bathtub-shaped, and others, but is not usually monotonous. In response, researchers have developed several classes and families of generalized distributions for studying monotonic or non-monotonic failure rates in models.
Over the years, several new and very useful classes and families of distributions have been developed, as well as generalized distributions obtained by adding one or more parameters to existing distributions. As a result of the extra parameters introduced, the tail weight and entropy of a density function can be controlled, depending on the resulting distribution. Some of the well-known families, to mention just a few of these, are as follow: the Marshall-Olkin-G, by [1], the beta-G, by [2], the transmuted-G, by [3], the gamma-G, by [4], the Kumaraswamy-G, by [5], the exponentiated generalized-G by [6], the T-X family, by [7], the logistic-G, by [8], the Weibull-G, by [9], and the odd log-logistic-G by [10], the type II half logistic family of distributions, by [11], the odd log-logistic Topp–Leone G family of distributions, by [12], the type II half logistic Kumaraswamy distribution, by [13], the exponentiated half-logistic odd Lindley-G family of distributions, by [14], the type II kumaraswamy half logistic family of distributions, by [15], the half logistic log-logistic Weibull distribution, by [16], the half logistic modified Kies exponential distribution, by [17], a class of distributions which includes the normal ones by [18], the stable symmetric family of distributions, by [19], towards the establishment of a family of distributions that best fits any data set, by [20], and the generation of distribution functions, by [21].
The Exponentiated Half Logistic-G (EHL-G) family of distributions was developed by [22]. The cumulative distribution function (cdf) and probability density function (pdf) of the EHL-G family distributions is given by:
and
respectively, for , and parameter vector . Pinho et al. [23] developed the Harris-G class of distributions with the cdf and pdf given as:
and
for and parameter vector , respectively. In this study, we introduce a new generalized family of distributions called the Exponentiated Half Logistic-Harris-G (EHL-Harris-G) distribution, which extends the Harris-G distribution by [23].
An empirical study on loss distributions was carried out by [24], using exploratory data analysis and empirical methods to estimate risk. They ruled out the use of exponential, gamma, and Weibull distributions, due to their lack of flexibility and poor results, and said, “one would need to use a model that is flexible enough”. Therefore, it is essential to develop models based on existing distributions, or create new models that are applicable in the areas of financial risk management, insurance and actuarial sciences, climate science, extreme value analysis and portfolio optimization. In light of the premises outlined above, we were motivated to seek out more flexible probability distributions to provide greater accuracy in fitting data. Therefore, several factors motivated the development of the exponentiated half logistic-Harris-G (EHL-Harris-G) family of distributions: (i) its ability to overcome the limitations of previous distributions, enhance flexibility, improve tail behavior, provide better statistical properties, and find applications in several fields; (ii) to define special models with a good number of different shapes of hazard rate function; (iii) to produce skewness for symmetrical models; (iv) to provide consistently better fits than other generalized distributions with the same underlying model; (v) to generalize some existing models in the literature; (vi) to modulate the weight of the tails of any baseline distribution. The areas of applicability for the EHL-Harris-G family of distributions in research include financial risk management, insurance and actuarial sciences, climate science, extreme value analysis and portfolio optimization.
The rest of the work is organized in the following manner. Section 2 presents the novel exponentiated half-logistic-Harris-G (EHL-Harris-G) family of distributions, reliability and hazard rate function, sub-families, linear representation and quantile function. In Section 3, moments, moment generating function, moments of residual life, the distribution of order statistics, and Rényi entropy are presented. Section 4 contains six different estimation methods to estimate the unknown parameters of the EHL-Harris-G family of distributions. In Section 5, some special models from the EHL-Harris-G family of distributions are presented. In Section 6, Monte Carlo simulations are employed to examine the consistency property of six estimation methods for the EHL-Harris-G distribution family. In Section 7, actuarial measures and numerical studies of these measures are presented. Real data applications are given in Section 8, followed by some concluding remarks in Section 9.
2. The New Family of Distributions
In this section, we define the cdf and the pdf of the exponentiated Half Logistic-Harris-G (EHL-Harris-G) family of distributions by considering a case where is in Equation (1). Then the pdf and the cdf of the Harris-G distribution, defined by (3) and (4), are substituted into Equations (1) and (2). The resulting cdf and pdf of the EHL-Harris-G family of distributions are, respectively, given by:
and
for , and parameter vector , where G is the baseline cdf.
2.1. Reliability Measures
The survival function and hazard rate function (hrf) of the EHL-Harris-G family of distributions are given by:
and
for , and parameter vector , respectively. The cumulative hazard rate function of the EHL-Harris-G family of distributions is given by:
2.2. Sub-Families of EHL-Harris-G Family of Distributions
- When we obtain the exponentiated half logistic-Marshall-Olkin-G family of distributions with the cdf:for and parameter vector This is a new family of distributions.
- When we obtain the half logistic-Harris-G family of distributions with the cdf:for and parameter vector This is a new family of distributions.
- When we obtain a reduced EHL-G family of distributions with the cdf:for and parameter vector See [22] for details.
- When we obtain the half logistic-Marshall-Olkin-G (HL-MO-G) family of distributions new family of distributions with the cdf:for and parameter vector
- If we obtain a reduced Half Logistic-G (HL-G) family of distributions with the cdf:for parameter vector See [22] for additional details.
- If we obtain a reduced EHL-G family of distributions with the cdf:for and parameter vector See [22] for details.
2.3. Linear Representation
In this sub-section, we express the pdf of the EHL-Harris-G family of distributions as an infinite linear combination of exponentiated-G (Exp-G) densities. Making use of the following generalized series expansions,
the pdf of the EHL-Harris-G family of distributions can be expressed as follows:
where is the Exp-G pdf with the power parameter and parameter vector and
Thus, the statistical properties of the EHL-Harris-G family of distributions can be obtained from those of the Exp-G family of distributions.
2.4. Quantile Function
The quantile function is a very useful quantity in many statistical applications and Monte-Carlo methods. It is used to generate some random numbers from a distribution and is obtained by the inverting pf the cdf of a distribution. Let us assume X to be a random variable following the EHL-Harris-G family of distributions. Then, the quantile function of X can be obtained as follows:
for , that is,
Consequently, the quantile function of the EHL-Harris-G family of distributions is given by:
Subsequently, using Equation (14) for the specified baseline cdf G, we can generate random variates from the EHL-Harris-G family of distributions for a specified baseline cdf G.
3. Statistical Properties
In this section, we provide some of the statistical properties of the EHL-Harris-G family of distributions, such as moments, the moment generating function, moments of residual life, distribution of order statistics and Rényi entropy. Throughout this section, we consider without any loss of generality to be the pdf of the EHL-Harris-G family of distributions.
3.1. Moments and Generating Function
3.2. Order Statistics
In reliability theory and quality control testing, order statistics plays a vital role in predicting the time to failure of components. Suppose are independent and identically distributed random variables from the EHL-Harris-G family of distributions. The pdf of the order statistic of X from the EHL-Harris-G pdf can be written as
Using Equations (5) and (6), we have
where is the Exp-G pdf with the power parameter and parameter vector and
Thus, by substituting (16) into (15), the pdf of the order statistic of X from the EHL-Harris-G family of distributions can be written as:
This shows that the pdf of order statistic of X from the EHL-Harris-G family of distributions can be obtained from those of the Exp-G family of distributions.
3.3. Rényi Entropy
Rényi entropy is a measure of randomness or uncertainty in the system. It is mostly used in information theory. Rényi entropy is defined as:
Note that
Rényi entropy for the EHL-Harris-G family of distributions is given by:
for where is the Rényi entropy of Exp-G distribution with power parameter and
Thus, the Rényi entropy of the EHL-Harris-G family of distributions can be obtained from that of the Exp-G family of distributions.
3.4. Moment of Residual and Reversed Residual Life
Moments of the residual life distribution are extensively used in reliability analysis. The moment of the residual life, say of a random variable X is:
Consequently, for the EHL-Harris-G family of distributions is given as follows:
where is as defined in Equation (11) and denotes the Exp-G distribution with power parameter . By setting in the formula above, we obtain the mean excess function of the EHL-Harris-G family of distributions. The moment of the reversed residual life, say , of a random variable X is:
Subsequently, for the EHL-Harris-G family of distributions is given as follows:
where is as defined in Equation (11) and denotes the Exp-G distribution with power parameter . If we set from the above formula, we derive the mean inactivity time of the EHL-Harris-G family of distributions.
4. Parameter Estimation
In statistical analysis, estimating the unknown parameters for a given sample is an important concept. In this section, we estimate the unknown parameters of the EHL-Harris-G family of distributions by using six different estimation methods. Let denote the vector of model parameters.
4.1. Maximum Likelihood Estimation
By assuming is the sample of size, n, obtained from the EHL-Harris-G family of distributions, the log-likelihood function for has the form:
The maximum likelihood estimates (MLEs) of , denoted by , can be obtained by solving the nonlinear system of equations, , numerically with respect to the parameters, using a numerical method, such as the Newton–Raphson procedure, since closed forms of the equations are intractable. The partial derivatives of the log-likelihood function with respect to each component of the parameter vector are given in the Appendix A.
4.2. Methods of Least Square and Weighted Least Square
Swain et al. [25] proposed the least-square (LS) and weighted least-square (WLS) estimation methods to estimate the beta distribution parameters. In the LS method, the unknown parameters are obtained by minimizing the following function:
For the WLS method, the unknown parameters are determined by minimization of the following function:
By solving the nonlinear equations numerically, values of the parameter estimates are obtained.
4.3. Method of Maximum Product of Spacing
Cheng and Amin [26] introduced the maximum product spacing (MPS) method as an alternative to the method of maximum likelihood for computing parameters for continuous univariate distributions. In the MPS method, the unknown parameters are obtained by maximizing the following function:
Solving the nonlinear equations we obtain the maximum product of spacing estimates of the parameters of the EHL-Harris-G family of distributions when the baseline cdf G is specified.
4.4. Method of Cramér–Von Mises
The Cramér-von Mises (CVM) method compares the empirical distribution function with a hypothesized theoretical distribution. In the Cramér–von Mises (CVM) method, the unknown parameters are obtained by minimizing the following function:
4.5. Method of Anderson–Darling
The Anderson–Darling (AD) estimation method obtains unknown parameters by minimizing the following function:
5. Some Special Cases
In this section, we introduce three special cases of the EHL-Harris-G family of distributions by generalizing the Weibull, Rayleigh and Uniform distributions.
5.1. Exponentiated Half Logistic-Harris-Weibull (EHL-Harris-W) Distribution
If we consider the Weibull distribution with cdf and pdf given by and , respectively, for and , then, the EHL-Harris-W distribution has cdf and pdf given by:
and
for and , respectively The hrf of the EHL-Harris-W distribution is given by:
for and
Figure 1 shows the plots of the pdf and the hrf of the EHL-Harris-W distribution for different parameter values. The pdf can take various shapes, including uni-modal and left- or right-skewed. Graphs of the hrf exhibit increasing, decreasing, bathtub, upside-down bathtub followed by bathtub, and bathtub followed by upside-down bathtub shapes. Figure 2 shows that the EHL-Harris-W distribution can model data sets with different levels of skewness and kurtosis.
Figure 1.
Plots of the pdf and hrf of the EHL-Harris-W distribution.
Figure 2.
Three Dimensional plots of the skewness and kurtosis of the EHL-Harris-W distribution for some selected parameter values.
It can be observed that,
- When and v are fixed, the skewness and kurtosis of EHL-Harris-W are both decreasing functions of and .
- When and are fixed, the skewness and kurtosis of EHL-Harris-W are both decreasing functions of and v.
5.2. Exponentiated Half Logistic-Harris-Rayleigh (EHL-Harris-R) Distribution
If we consider the Rayleigh distribution with cdf and pdf given by and , respectively, for and , then, the EHL-Harris-R distribution has cdf and pdf given by:
and
for and , respectively The corresponding hrf is given by:
for and
Figure 3 shows the plots of the pdf and the hrf of the EHL-Harris-R distribution for different parameter values.The pdf can take various shapes, such as uni-modal and left- or right-skewed. Graphs of the hrf exhibit different shapes, such as increasing, decreasing, bathtub, bathtub followed by upside-down bathtub, and upside-down bathtub followed by bathtub shapes. Figure 4 shows that the EHL-Harris-R distribution can model data sets with different levels of skewness and kurtosis.
Figure 3.
Plots of the pdf and hrf of the EHL-Harris-R distribution.
Figure 4.
Three Dimensional plots of the skewness and kurtosis of the EHL-Harris-R distribution for some selected parameter values.
It can be observed that.
- When and v and are fixed, both the skewness and kurtosis of EHL-Harris-R decreases for varying values of and .
- When and are fixed, both the skewness and kurtosis of EHL-Harris-R decreases for varying values of and v.
5.3. Exponentiated Half Logistic-Harris-Uniform (EHL-Harris-U) Distribution
If we consider the Uniform distribution with cdf and pdf given by and , respectively, for and , then, the EHL-Harris-U distribution has cdf and pdf given by:
and
for and , respectively The hrf of the EHL-Harris-U distribution is given by:
for and
Figure 5 shows the plots of the pdf and the hrf of the EHL-Harris-U distribution for different parameter values. The pdf for the EHL-Harris-U distribution can be J, reverse-J, uni-modal, right-skewed, and U-shaped. Graphs of the hrf exhibit increasing, decreasing, bathtub, and upside-down bathtub followed by bathtub shapes.
Figure 5.
Plots of the pdf and hrf of the EHL-Harris-Uniform distribution.
6. Monte Carlo Simulation Study
With the six estimation methods discussed in Section 4, the performance of the EHL-Harris-W distribution was examined by conducting various simulations for different sizes (n = 25, 50, 100, 200, 400, 800) via the R software. We simulated samples for the true parameter values of , given in Table 1 and Table 2. The tables list the average bias (ABIAS) and root mean squared errors (RMSEs) for the six estimation methods, withdifferent sample sizes: MLE, LS, WLS, MPS, CVM, and AD. The ABIAS and RMSE for the estimated parameter, say, , are, respectively, given by:
Table 1.
Simulation Results for Different Estimation Methods for .
Table 2.
Simulation Results for Different Estimation Methods for .
In Table 1 and Table 2, the row indicating ∑ Ranks corresponds to the partial sum of the ranks. Among all the estimators for a given metric, the superscript indicates the rank. Table 1 presents, for example, the ABIAS of , obtained via the MLE method, as for . This indicates that the ABIAS of obtained using the MLE method ranks second among all other estimators. So, when , in comparison with all other estimators, MLE provided the second best ABIAS of .
Table 3 shows the partial and overall ranks of all the estimation methods of the EHL-Harris-W distribution by means of various model parameter values. Based on the results in Table 1 and Table 2, the EHL-Harris-W distribution was stable, as the ABIAS and RMSE values for its four parameters were modest. It can be observed that the bias occasionally decreased with increasing sample size, while RMSE decreased as sample size increased for all estimations. Figure 6 and Figure 7 demonstrate how the RMSEs of the parameters decreased with increasing sample size for each estimation method. It appears that, for large sample sizes, all estimation methods provided accurate bias and mean squared error estimates. Table 3 shows that the MLE and WLS methods almost equally allowed us to obtain better estimates of EHL-Harris-W parameters, with MLE ranked first, followed by WLS and, then, by AD. According to the rankings, the CVM method performed the least well.
Table 3.
Partial and Overall Ranks of all Estimation Methods of EHL-Harris-W Distribution by Various Model Parameter Values.
Figure 6.
Plots of RMSEs of parameters in Table 1.
Figure 7.
Plots of RMSEs of parameters in Table 2.
7. Actuarial Measures
In this section, we present some risk measures, which are frequently used by financial and actuarial practitioners, to evaluate the exposure to market risk in a portfolio of instruments, and, specifically, the following: value at risk (VaR), tail value at risk (TVaR), tail variance (TV) and tail variance premium (TVP).
7.1. Value at Risk (VaR)
VaR is a widely used actuarial measure, which measures financial market risk. It is also known as the quantile risk measure, or the quantile premium principle, and it is always given with a specified confidence level, for example (, or . The VaR for the EHL-Harris-G family of distributions is given by:
where is a specified level of significance.
7.2. Tail Value at Risk (TVaR)
The TVaR is also an important and widely used actuarial measure to quantify the expected value of loss were an event outside the specified probability level to occur. It is also known as the conditional tail expectation (CTE), or tail conditional expectation (TCE). For the EHL-Harris-G family of distributions, the TVaR is given by:
is given by Equation (11).
7.3. Tail Variance (TV)
The tail variance is one of the most important actuarial measures which looks at the variance beyond the VaR. The TV of the EHL-Harris-G family of distributions can be defined as:
where is the Exp-G pdf with the power parameter and parameter vector and is given by Equation (11). Thus, the TV of the EHL-Harris-G family of distributions can be obtained from those of the Exp-G distribution.
7.4. Tail Variance Premium (TVP)
The TVP is an important acturial measure that plays an essential role in insurance sciences. The TVP of the EHL-Harris-G family of distributions takes the form:
where . The TVP of the EHL-Harris-G family of distributions can be obtained by substituting the Equations (25) and (26) into Equation (27).
7.5. Numerical Study for the Risk Measures
The numerical simulations for VaR, TVaR, TV and TVP are presented in this sub-section. The VaR, TVaR, TV and TVP of the Exponentiated Half Logistic-Harris-W (EHL-Harris-W) distribution were compared with those of its sub-models, namely, HL-Harris-W, EHL-MO-W, HL-W and EHL-W, to assess the roles of additional parameters in modulating the weight of the tails of the newly proposed distribution and the non-nested models, alpha power Topp-Leone Weibull (APTLW) distribution, by [27], and alpha power log-logistic distribution (APExLLD), by [28].
The simulation results were obtained as follows:
1. Random samples of size were generated from each one of the used distributions and parameters estimated via the maximum likelihood method.
2. One thousand repetitions were made to calculate the VaR, TVaR, TV and TVP for these distributions.
Table 4 and Table 5 show the numerical findings of VaR, TVaR, TV and TVP for the five compared distributions. A model with higher values of VaR, TVaR, TV and TVP is said to have a heavier tail. The results given in Table 4 and Table 5 show that the EHL-Harris-W distribution had heavier tails than those of the HL-Harris-W, EHL-MO-W, HL-W, EHL-W, APTLW and APExLLD distributions, since the computed risk figures of the EHL-Harris-W distribution were higher.
Table 4.
Simulation Results of VaR, TVaR, TV and TVP.
Table 5.
Simulation Results of VaR, TVaR, TV and TVP.
8. Applications
In this section, we demonstrate the importance and flexibility of the EHL-Harris-G family of distributions by applying its special case, namely, EHL-Harris-W distribution, to two real data sets. The NLmixed procedure in SAS was used to estimate the model parameters and the package AdequacyModel in R software was used for goodness-of-fit statistics. See Appendix A for some R codes. The estimated values of the parameters (standard error in parenthesis), −2log-likelihood statistic , Akaike Information Criterion Bayesian Information Criterion and Consistent Akaike Information Criterion , where is the value of the likelihood function evaluated at the parameter estimates, n is the number of observations, and p is the number of estimated parameters presented. Further presented are the following goodness-of-fit statistics: Crameŕ–von Mises () and Anderson–Darling Statistics (), described by [29], as well as the Kolmogorov–Smirnov (K-S) statistic and its p-value. Let be the cdf, where the form of F is known, but the k-dimensional parameter vector, , is unknown. We can obtain the statistics and as follows: (i) Compute where the ’s are in ascending order; (ii) Compute where is the standard normal cdf and its inverse; (iii) Compute where and (iv) Calculate and (v) Modify into and into It is well known that, for the value of the log-likelihood function at its maximum (), a larger value is good and preferred, and for AIC, AICC, BIC, and the goodness-of-fit statistics , and , smaller values are preferred. The results are presented in Table 6 and Table 7.
Table 6.
Parameter Estimates and Goodness-of-fit Statistics for Various Models Fitted for Survival Times of Chemotherapy Patients Data.
Table 7.
Parameter Estimates and Goodness-of-fit Statistics for Various Models Fitted for Level of Mercury Data.
The EHL-Harris-W distribution was compared with the following distributions: Exponentiated Half-logistic Odd Lindley–Weibull (EHL-OL-W), by [14], Topp-Leone-Harris-Log-logistic (TL-Harris-LLoG), by [30], Harris Extended Burr XII (HEBXII), by [31], Exponentiated Half-logistic Odd Burr III-Exponential (EHL-OBIII-E), by [32], Odd Exponentiated Half-logistic Burr XII (OEHL-BXII), by [33], and Harris Extended Lomax (HEL), by [31]. The pdfs of the EHL-OL-W, TL-Harris-LLoG, HEBXII, EHL-OBIII-E, OEHL-BXII and HEL distributions are provided below:
for ,
for ,
for ,
for ,
for , and
for
We present plots of the fitted densities, the histogram of the data and probability plots for each example to show how well our model fits the observed data sets. To obtain the probability plot, we plotted against , where are the ordered values of the observed data. The measures of closeness are given by the sum of squares These plots are shown in Figure 8 and Figure 9. Figure 10 and Figure 11 show the profile log-likelihood plots for parameters of the EHL-Harris-W distribution on both data sets. The TTT scaled plots, estimated cdfs and K–M survival curves are presented in Figure 12 and Figure 13, respectively for both data sets.
Figure 8.
Histogram, fitted density and probability plots for survival times of chemotherapy patients data.
Figure 9.
Histogram, fitted density and probability plots for level of mercury data.
Figure 10.
Profile log-likelihood function plots for parameters of EHL-Harris-W distribution on survival times of chemotherapy patients data.
Figure 11.
Profile log-likelihood function plots for parameters of EHL-Harris-W distribution on level of mercury data.
Figure 12.
Estimated cdf, Kaplan–Meier survival and scaled TTT–Transform plots for the EHL-Harris-W distribution for survival times of chemotherapy patients data.
Figure 13.
Estimated cdf, Kaplan–Meier survival and scaled TTT–Transform plots for the EHL-Harris-W distribution for level of mercury data.
8.1. Survival Times of Chemotherapy Patients Data
The first data set was a subset of the data reported by [34,35], and it corresponded to the survival times (in years) of a group of 46 patients who received chemotherapy alone. This example was chosen due to the critical importance of survival analysis in medical research and clinical trials. This application can aid in the making of informed decisions about treatment protocols, identifying high-risk patient groups, and evaluating the effectiveness of different therapeutic strategies. The observations were as follows: 0.047, 0.115, 0.121, 0.132, 0.164, 0.197, 0.203, 0.260, 0.282, 0.296, 0.334, 0.395, 0.458, 0.466, 0.501, 0.507, 0.529, 0.534, 0.540, 0.641, 0.644, 0.696, 0.841, 0.863, 1.099, 1.219, 1.271, 1.326, 1.447, 1.485, 1.553, 1.581, 1.589, 2.178, 2.343, 2.416, 2.444, 2.825, 2.830, 3.578, 3.658, 3.743, 3.978, 4.003, 4.033. The estimated variance–covariance matrix for the EHL-Harris-W model on survival times of chemotherapy patients data is given by:
and the 95% two-sided asymptotic confidence intervals for and are given by and , respectively.
Table 6 indicates that the EHL-Harris-W distribution had the highest p-value for the K–S statistic and the lowest values for all the goodness-of-fit statistics. Thus, we concluded that the EHL-Harris-W model performed better on the survival times of chemotherapy patients data than the non-nested EHL-OL-W, TL-Harris-LLoG, HEBXII, EHL-OBIII-E, OEHL-BXII and HEL models. Further, Figure 8 shows that our model outperformed the competing non-nested models. Figure 10 shows the profile log-likelihood plots for parameters of the EHL-Harris-W distribution on survival times of chemotherapy patients data. This shows that the MLEs of the EHL-Harris-W distribution can be uniquely determined.
In Figure 12, we see that the cdf line for the EHL-Harris-W distribution, indicated by the blue line, was closer to the empirical cdf, while the survival function in blue was also close to the Kaplan–Meier (K–M) curve, which indicated that our model was the best in explaining the survival times of chemotherapy patients data. The TTT plot for survival times of chemotherapy patients data indicated a non-monotonic hazard rate function; hence, the survival times of chemotherapy patients data can be fitted to our model.
8.2. Level of Mercury Data
The second data set refers to the level of mercury in 34 albacore caught in the Eastern Mediterranean, obtained from [36]. The level of mercury in seafood is an important concern, as it can have adverse effects on human health; particularly, for pregnant women and young children. Therefore, accurately modeling and understanding the distribution of mercury levels can provide valuable information for policymakers and public health officials to make informed decisions regarding fish consumption guidelines and seafood safety regulations. The observations were as follows: 1.007, 1.447, 0.763, 2.010, 1.346, 1.243, 1.586, 0.821, 1.735, 1.396, 1.109, 0.993, 2.007, 1.373, 2.242, 1.647, 1.350, 0.948, 1.501, 1.907, 1.952, 0.996, 1.433, 0.866, 1.049, 1.665, 2.139, 0.534, 1.027, 1.678, 1.214, 0.905, 1.525, 0.763.
The estimated variance–covariance matrix for the EHL-Harris-W model on level of mercury data set is given by:
and the 95% two-sided asymptotic confidence intervals for and are given by and , respectively.
Table 7 indicates that the EHL-Harris-W distribution had the highest p-value for the K-S statistic and the lowest values of all the goodness-of-fit statistics. The EHL-Harris-W model, therefore, worked better with the level of mercury data than did the EHL-OL-W, TL-Harris-LLoG, HEBXII, EHL-OBIII-E, OEHL-BXII and HEL models. In addition, Figure 9 shows that our model outperformed the competing non-nested models on the level of mercury data. Figure 11 shows the profile log-likelihood plots for parameters of the EHL-Harris-W distribution on level of mercury data. This shows that the MLEs of the EHL-Harris-W distribution can be uniquely determined.
In Figure 13, we see that the cdf curve for the EHL-Harris-W distribution, indicated in blue, was closer to the empirical cdf, while the survival function, in blue, was also close to the Kaplan–Meier (K–M) curve, which indicated that our model was the best in explaining the level of mercury data. The TTT plot for level of mercury data indicated an increasing hazard rate function; hence, the level of mercury data can be fitted to our model.
9. Concluding Remarks
This paper develops and presents a novel family of generalized distributions, called the Exponentiated Half Logistic-Harris-G (EHL-Harris-G) distribution. The motivation for introducing this generalized family of distributions lies in its ability to overcome the limitations of previous families, enhance flexibility, improve tail behavior, provide better statistical properties and it has applications in several fields. There are several new distributions in the EHL-Harris-G family of distributions that are special cases or sub-models. There is quite a bit of variation in the behavior of the hazard rate functions of the EHL-Harris-G family of distributions for specific baseline cdfs. The moments, distribution of order statistics, and Rényi entropy are also expressed in closed form. A variety of estimation approaches were used to estimate parameters of the EHL-Harris-G family of distributions. These included maximum likelihood estimation, least-squares estimation, weighted least-squares estimation, maximum product spacing estimation, Cramér–von Mises estimation, and Anderson–Darling estimation. Monte Carlo simulations were used to evaluate the consistency properties of the six estimation methods for a special case of the EHL-Harris-G distribution. The simulation results showed that the MLE and WLS methods almost equally allowed us to obtain better estimates of the EHL-Harris-W parameters, with MLE ranked first, followed by WLS and, then, AD. According to the rankings, the CVM method performed the least well. Risk measures and numerical studies of the measures are also presented. Finally, we showed that the EHL-Harris-W distribution is superior to some other equi-parameter non-nested models in the literature by fitting it to two real-life data sets from different disciplines. In the future, we will extend this new family of distributions via bivariate extensions. This extension can be achieved through various methods, including the use of copulas and probability generating functions. Furthermore, obtaining the distributions of the sums, products, and ratios of random variables will be considered. In addition, future research will incorporate Bayesian estimation methods to estimate model parameters, since Bayesian methods can be applied to almost all parametric methods, and are becoming more popular across disciplines.
Author Contributions
Conceptualization, B.O.; Methodology, B.O., T.M. and W.S.; Software, G.W.-L., B.O. and T.M.; Formal analysis, B.O., G.W.-L. and T.M.; Writing—review & editing, B.O., G.W.-L. and T.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not Applicable.
Informed Consent Statement
Not Applicable.
Data Availability Statement
Not Applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
In this section, the following R codes, for the EHL-Harris-W distribution, to compute cdf are presented: pdf, hrf, quantile function, moments, maximum likelihood estimates, variance-covariance matrix and goodness-of-fit statistics.
- #cdf of the EHL-Harris-W distribution
- EHL_Harris_W_cdf=function (x, alpha, v, delta, lambda){
- g=lambda∗x∗∗(lambda − 1)∗exp(−x∗∗lambda)
- G=1 − exp(−x∗∗lambda)
- C=(1 − G)∗∗v
- A=((delta∗C)/(1 − (1 − delta)∗C))∗∗(1/v)
- y=((1 − A)/(1 + A))∗∗alpha
- return(y)
- }
- #pdf of the EHL-Harris-W distribution
- EHL_Harris_W_pdf=function (x, alpha, v, delta, lambda){
- g=lambda∗x∗∗(lambda − 1)∗exp(−x∗∗lambda)
- G=1 − exp(−x∗∗lambda)
- C=(1 − G)∗∗v
- A=((delta∗C)/(1 − (1 − delta)∗C))∗∗(1/v)
- y=2∗alpha∗(delta∗∗(1/v))∗g∗((1 − (1 − delta)∗C)∗∗(−(1+1/v)))∗((1 − A)∗∗(alpha − 1))∗((1+A)∗∗(−(alpha+1)))
- return(y)
- }
- #hrf of the EHL-Harris-W distribution
- EHL_Harris_W_hrf=function (x, alpha, v, delta, lambda){
- y=EHL_Harris_W_pdf (x, alpha, v, delta, lambda)/(1 − EHL_Harris_W_cdf (x, alpha, v, delta, lambda))
- return (y)
- }
- #quantile function of the EHL-Harris-W distribution
- quantile=function (alpha, v, delta, lambda, u){
- result<−quantile_Weibull (1 − ((((1 − u∗∗(1/alpha))/(1+u∗∗(1/alpha)))∗∗(−v))∗delta+(1 − delta))∗∗(−1/v),lambda)
- return (result)
- }
- #moments of the EHL-Harris-W distribution
- moment_EHL_Harris_W=function (alpha, v, delta, lambda, r){
- f=function (x, alpha, v, delta, lambda, r)
- {(x^r)∗(EHL_Harris_W_pdf (x, alpha, v, delta, lambda))}
- y=integrate (f,lower=0,upper=Inf, subdivisions=100,
- alpha=alpha, v=v, delta=delta, lambda=lambda, r=r)
- return (y)
- }
- #maximum likelihood estimates and variance-covariance matrix
- EHL_Harris_W_LL<−function (alpha, v, delta, lambda){−sum(log(2 ∗ alpha ∗ delta∗∗(1 / v) ∗ lambda ∗
- x∗∗(lambda − 1)
- ∗ exp(−x∗∗lambda) ∗ (1 − (1 − delta) ∗ exp(−v ∗ x∗∗lambda))∗∗(−(1 / v + 1)) ∗
- (1 − ((delta ∗ exp(−v ∗ x∗∗lambda)) / (1 − (1 − delta) ∗ exp(−v ∗ x∗∗lambda)))∗∗(1 / v))∗∗(alpha − 1) ∗
- (1 + ((delta ∗ exp(−v ∗ x∗∗lambda)) / (1 − (1 − delta) ∗ exp(−v ∗ x∗∗lambda)))∗∗(1 / v))∗∗
- (−(alpha + 1))))}
- mle.results<−mle2(EHL_Harris_W_LL, start=list (alpha=0.6883, v=39.6139, delta=216.05, lambda=0.8904),
- hessian.opt=TRUE, optimizer=“optim”, method=“BFGS”)
- summary (mle.results)
- vcov (mle.results)
- #goodness-of-fit statistics
- goodness.fit(pdf=EHL_Harris_W_pdf, cdf=EHL_Harris_W_cdf, mle = c(0.6883, 39.6139, 216.05, 0.8904),
- data = x, domain=c(0,5))
References
- Marshall, A.; Olkin, I. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika 1997, 84, 641–652. [Google Scholar] [CrossRef]
- Eugene, N.; Lee, C.; Famoye, F. Beta-normal distribution and its applications. Commun. Stat.-Theory Methods 2002, 31, 497–512. [Google Scholar] [CrossRef]
- Shaw, W.T.; Buckley, I.R. The alchemy of probability distributions: Beyond gram-charlier expansions, and a skew-kurtotic-normal distribution from a rank transmutation map. arXiv 2009, arXiv:0901.0434. [Google Scholar]
- Zografos, K.; Balakhrishnan, N. On families of beta and generalized gamma generated distribution and associated inference. Stat. Methods 2009, 6, 344–362. [Google Scholar] [CrossRef]
- Cordeiro, G.M.; de Castro, M. A new family of generalized distributions. J. Stat. Comput. Simul. 2011, 81, 883–898. [Google Scholar] [CrossRef]
- Cordeiro, G.M.; Ortega, E.M.; da Cunha, D.C. The exponentiated generalized class of distributions. J. Data Sci. 2013, 11, 1–27. [Google Scholar] [CrossRef]
- Alzaatreh, A.; Lee, C.; Famoye, F. A new method for generating families of continuous distributions. Metron 2013, 71, 63–79. [Google Scholar] [CrossRef]
- Torabi, H.; Montazeri, N.H. The logistic-uniform distribution and its applications. Commun. Stat.-Simul. Comput. 2014, 43, 2551–2569. [Google Scholar] [CrossRef]
- Bourguignon, M.; Silva, R.B.; Cordeiro, G.M. The Weibull-G family of probability distributions. J. Data Sci. 2014, 12, 53–68. [Google Scholar] [CrossRef]
- Alizadeh, M.; MirMostafee, S.; Ortega, E.M.; Ramires, T.G.; Cordeiro, G.M. The odd log-logistic logarithmic generated family of distributions with applications in different areas. J. Stat. Distrib. Appl. 2017, 4, 6. [Google Scholar] [CrossRef][Green Version]
- Soliman, A.H.; Elgarhy, M.A.E.; Shakil, M. Type II half logistic family of distributions with applications. Pak. J. Stat. Oper. Res. 2017, 13, 245–264. [Google Scholar] [CrossRef]
- Alizadeh, M.; Lak, F.; Rasekhi, M.; Ramires, T.G.; Yousof, H.M.; Altun, E. The odd log-logistic Topp–Leone G family of distributions: Heteroscedastic regression models and applications. Comput. Stat. 2018, 33, 1217–1244. [Google Scholar] [CrossRef]
- ZeinEldin, R.A.; Haq, M.A.U.; Hashmi, S.; Elsehety, M.; Elgarhy, M. Type II half logistic Kumaraswamy distribution with applications. J. Funct. Spaces 2020, 2020, 1343596. [Google Scholar] [CrossRef]
- Sengweni, W.; Oluyede, B.; Makubate, B. The exponentiated half-logistic odd Lindley-G family of distributions with applications. J. Nonlinear Sci. Appl. 2021, 14, 287–309. [Google Scholar] [CrossRef]
- El-Sherpieny, E.S.A.; Elsehetry, M.M. Type II kumaraswamy half logistic family of distributions with applications to exponential model. Ann. Data Sci. 2019, 6, 1–20. [Google Scholar] [CrossRef]
- Moakofi, T.; Oluyede, B.; Makubate, B. The half logistic log-logistic Weibull distribution: Model, properties and applications. Eurasian Bull. Math. 2022, 4, 186–210. [Google Scholar]
- Alghamdi, S.M.; Shrahili, M.; Hassan, A.S.; Gemeay, A.M.; Elbatal, I.; Elgarhy, M. Statistical inference of the half logistic modified Kies exponential model with modeling to engineering data. Symmetry 2023, 15, 586. [Google Scholar] [CrossRef]
- Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
- Barakat, H.M. A new method for adding two parameters to a family of distributions with application to the normal and exponential families. Stat. Methods Appl. 2015, 24, 359–372. [Google Scholar] [CrossRef]
- Barakat, H.M.; Khaled, O.M. Towards the establishment of a family of distributions that best fits any data set. Commun. Stat. Simul. Comput. 2017, 46, 6129–6143. [Google Scholar] [CrossRef]
- AL-Hussaini, E.K.; Abdel-Hamid, A.H. Generation of Distribution Functions: A Survey. J. Stat. Appl. Probab. 2018, 7, 91–103. [Google Scholar] [CrossRef]
- Cordeiro, G.M.; Alizadeh, M.; Ortega, E.M. The exponentiated half-logistic family of distributions: Properties and applications. J. Probab. Stat. 2014, 2014, 864396. [Google Scholar] [CrossRef]
- Pinho, L.G.B.; Cordeiro, G.M.; Nobre, J.S. On the Harris-G class of distributions: General results and application. Braz. J. Probab. Stat. 2015, 29, 813–832. [Google Scholar] [CrossRef]
- Dutta, K.; Perry, J. A Tale of Tails: An Empirical Analysis of Loss Distribution Models for Estimating Operational Risk Capital (No. 06-13); Working Papers; Econstor: Singapore, 2006. [Google Scholar]
- Swain, J.J.; Venkatraman, S.; Wilson, J.R. Least-squares estimation of distribution functions in Johnson’s translation system. J. Stat. Comput. Simul. 1988, 29, 271–297. [Google Scholar] [CrossRef]
- Cheng, R.C.H.; Amin, N.A.K. Maximum Product-of-Spacings Estimation with Applications to the Log-Normal Distribution; Mathematical Report 79-1; University of Wales IST: Cardiff, Wales, 1979. [Google Scholar]
- Benkhelifa, L. Alpha power Topp-Leone Weibull distribution: Properties, characterizations, regression modeling and applications. J. Stat. Manag. Syst. 2022, 25, 1–26. [Google Scholar] [CrossRef]
- Teamah, A.E.A.; Elbanna, A.A.; Gemeay, A.M. Heavy-tailed log-logistic distribution: Properties, risk measures and applications. Stat. Optim. Inf. Comput. 2021, 9, 910–941. [Google Scholar] [CrossRef]
- Chen, G.; Balakrishnan, N. A general purpose approximate goodness-of-fit test. J. Qual. Technol. 1995, 27, 154–161. [Google Scholar] [CrossRef]
- Oluyede, B.; Dingalo, N.; Chipepa, F. The Topp-Leone-Harris-G family of distributions with applications. Int. J. Math. Oper. Res. 2023, 24, 554–582. [Google Scholar] [CrossRef]
- Batsidis, A.; Lemonte, A.J. On the Harris extended family of distributions. Statistics 2015, 49, 1400–1421. [Google Scholar] [CrossRef]
- Oluyede, B.; Peter, P.O.; Ndwapi, N.; Bindele, H. The exponentiated half-logistic odd Burr III-G: Model, properties and applications. Pak. J. Stat. Oper. Res. 2022, 18, 33–57. [Google Scholar] [CrossRef]
- Aldahlan, M.; Afify, A.Z. The odd exponentiated half-logistic Burr XII distribution. Pak. J. Stat. Oper. Res. 2018, XIV, 305–317. [Google Scholar] [CrossRef]
- Bekker, A.; Roux, J.J.J.; Mosteit, P.J. A generalization of the compound Rayleigh distribution: Using a Bayesian method on cancer survival times. Commun. Stat.-Theory Methods 2000, 29, 1419–1433. [Google Scholar] [CrossRef]
- Stablein, D.M.; Carter, W.H., Jr.; Novak, J.W. Analysis of survival data with non-proportional hazard functions. Control. Clin. Trials 1981, 2, 149–159. [Google Scholar] [CrossRef] [PubMed]
- Mol, S.; Ozden, O.; Karakulak, S. Levels of selected metals in albacore (Thunnus alalunga, Bonnaterre, 1788) from the Eastern Mediterranean. J. Aquat. Food Prod. Technol. 2012, 21, 111–117. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).