Abstract
Probability distributions are very useful in modeling lifetime datasets. However, no specific distribution is suitable for all kinds of datasets. In this study, the bounded truncated Cauchy power exponential distribution is proposed for modeling datasets on the unit interval. The probability density function exhibits desirable shapes, such as left-skewed, right-skewed, reversed J, and bathtub shapes, whereas the hazard rate function displays J and bathtub shapes. For the purpose of modeling dependence between measures in a dataset, a bivariate extension of the proposed distribution is developed. The bivariate probability density function displays monotonic and non-monotonic shapes, making it suitable for modeling complex bivariate relations. Subsequently, the applications of the distribution are illustrated using COVID-19 data. The results revealed that the new distribution provides a better fit to the datasets compared to other existing distributions. Finally, a new quantile regression model is developed and its application demonstrated. The generated quantile regression model offers a decent fit to the data, according to the residual analysis.
1. Introduction
Disease modeling and prediction are primary tasks of epidemiologists and researchers interested in the estimation of disease occurrences. To perform these tasks, modeling the variability in disease occurrences using probability distributions is essential. With the emergence of the novel coronavirus disease in late 2019 (COVID-19) and its negative impact on humanity, many researchers have proposed new probability distributions (discrete or continuous) for modeling the number of infections, mortality rate, and recovery rates, among others. Some of the proposed probability distributions or families of distributions include: Marshall–Olkin reduced Kies distribution [1], modified inverse Weibull distribution [2], weighted Weibull distribution [3], type I half logistic Burr X-G family [4], unit power Weibull distribution [5], new extended exponentiated Weibull distribution [6], discrete extended odd Weibull exponential distribution [7], odd Weibull inverse Topp–Leone distribution [8], log-logistic tangent distribution [9], discrete-type half-logistic exponential distribution [10], and unit Johnson distribution [11].
Among these probability distributions used for modeling diseases, those defined on the unit interval play a major role due to their usefulness in areas such as health, psychology, and epidemiology, among others. For instance, researchers may be interested in modeling mortality or recovery rates. Observations measured on these variables are usually proportions, fractions, or rates, which are defined in the unit interval. Although the beta distribution is the oldest for modeling datasets measured on the unit interval, the intractability of its cumulative distribution function (CDF) and quantile function has called for the development of new distributions with tractable CDFs and quantile functions that are also capable of modeling data on the unit interval. Unit distributions proposed recently in literature include: unit Gamma/Gompertz distribution [12], bounded odd inverse Pareto exponential distribution [13], bounded shifted Gompertz distribution [14], unit modified Burr-III distribution [15], unit generalized half normal distribution [16], unit Lindley distribution [17], unit Gompertz distribution [18], logit slash distribution [19], unit Weibull distribution [20] and unit inverse Gaussian distribution [21].
Despite the existence of many unit distributions in the literature, no single distribution is capable of modeling all forms of data since the data generating process produces data with different characteristics such as symmetric, skewed, varied degrees of kurtosis, and monotonic and non-monotonic failure rates. This study thus proposes a new unit distribution called the bounded truncated Cauchy power exponential (BTCPE) distribution. The motivations for developing the new distribution are as follows: to provide a model capable of modeling complex data on unit interval that exhibits platykurtic, leptokurtic, reversed J, left-skewed, right-skewed, bathtub, and J shapes; to develop a bivariate distribution for modeling interdependence between random data on unit interval; and to develop a quantile regression model for understanding the relationship between a response variable and given covariates.
The remainder of the paper is organized in nine sections, described as follows: Section 2 presents the development of the BTCPE distribution, Section 3 describes some of its important properties, Section 4 focuses on a special bivariate extension of the BTCPE distribution, Section 5 is devoted to the parametric estimation methods, Section 6 presents the Monte Carlo simulation of nine frequentist estimation methods, Section 7 contains the univariate applications of the BTCPE distribution, Section 8 is about the quantile regression model and its application, and finally the conclusion of the paper is presented in Section 9.
2. Bounded Truncated Cauchy Power Exponential Distribution
A random variable follows the truncated Cauchy power exponential (TCPE) distribution if its CDF and probability density function (PDF), respectively, are defined as
and
The TCPE distribution can be presented as a special case of the TCP Weibull distribution proposed by [22]. Now, we define a new unit distribution, called the BTCPE distribution, corresponding to the distribution of . The associated CDF is obtained as follows:
Hence, the CDF of the BTCPE distribution is expressed as
and and are the shape parameters that have to be estimated. The associated PDF of the BTCPE distribution is obtained by differentiating Equation (3), and it is given by
Often, the PDFs are expressed in expanded form for easy derivation of the statistical properties of the proposed distribution. The expanded form of the PDF of the BTCPE distribution is mainly obtained using the generalized binomial expansion, , where is any real number. Thus, it is given by
The corresponding hazard rate function (HRF) is given by
The shapes of the PDF and HRF for some given parameter values are shown in Figure 1. The PDF exhibits symmetric, bathtub, left-skewed and right-skewed shapes for the given parameter values. The HRF displays bathtub and increasing failure rates.
Figure 1.
PDF (left) and HRF (right) of the BTCPE distribution.
3. Some Important Properties
This section presents some relevant properties of the BTCPE distribution.
3.1. Distribution Inequalities
This subsection investigates some desirable inequalities satisfied by the CDF of the BTCPE distribution. These inequalities are very essential in determining the first order stochastic dominance of random variables [23].
Proposition 1.
The CDF of the BTCPE distribution is increasing with respect to the parameter. The CDF of the BTCPE distribution is decreasing with respect to the parameter.
Proof.
For the first point, since , for , we have
This means that is increasing with respect to . For the second point, since , for , we have
This implies that is decreasing with respect to . This completes the proof of the proposition. From Proposition 1, the following stochastic ordering property follows immediately: if then . Also, if then . ☐
3.2. Quantile Function
The quantile function or the inverse CDF is simply the solution of the following nonlinear equation: , for all . Thus, after some algebraic manipulation, we have
The median is obtained by substituting . The quantile function plays an important role in the generation of random observations from the BTCPE distribution. The quantile function values are also useful in computing measures of skewness and kurtosis. As a classical quantile measure, the MacGillivray measure of skewness [24] is given by
In particular, the MacGillivray measure of skewness can be used to efficiently describe the effect of the parameters on the skewness. The more the shapes of vary according to the parameters, the more flexible the skewness is. Figure 2 shows the plot of this skewness measure for a fixed value of while varies and for a fixed value of while varies. From Figure 2, the wider variations seen imply that both parameters have a strong influence on the skewness of the BTCPE distribution. In addition, as the values of or increase, gets closer to the horizontal line. This shows that utilizing higher values of the parameter can result in a symmetrical distribution.
Figure 2.
Plots of the MacGillivray skewness.
The kurtosis of the BTCPE distribution can be studied using the Moors kurtosis [25]. The Moors (coefficient of) kurtosis is usually given by
Large values of the Moors kurtosis imply that the distribution has a heavy tail, and small values are indications of a light tail. Figure 3 displays the Moors kurtosis for the BTCPE distribution. It can be observed that the BTCPE distribution exhibits various degrees of kurtosis. When the parameters and are equal, the distribution displays a platykurtic shape. The overall shapes show how flexible the BTCPE distribution is with regards to modeling datasets having different degrees of kurtosis and skewness.
Figure 3.
Plots of Moors kurtosis.
3.3. Moments and Moments Generating Function
The moments, incomplete moments and moment generating function of the BTCPE distribution are presented in this subsection.
Proposition 2.
Ifis a BTCPE random variable, i.e., a random variable with the BTCPE distribution, then itsnon-central moment is given by
whereis the beta integral function.
Proof.
The non-central moment of the BTCPE random variable is defined as . Thus, substituting the expanded form of the PDF given in Equation (5) yields
Letting and
, we get
Hence, several algebraic manipulation yield
This completes the proof. ☐
The non-central moments can be used to derive other important characteristics of the BTCPE distribution such as estimating the variance, coefficient of skewness and kurtosis.
Proposition 3.
Theincomplete moment of the BTCPE random variable is given by
whereis the incomplete beta integral function.
Proof.
By definition, the incomplete moment is given by
Hence, substituting the expanded form of the PDF into the definition yields
Letting
and
. Hence, applying similar concepts for proving the incomplete moments yields
This completes the proof. ☐
The moment generating function is useful for deriving the moments of a random variable if only the moment exists.
Proposition 4.
The moment generating function of the BTCPE random variable is given by
Proof.
By definition and a standard exponential expansion, we have . Hence, substituting the non-central moment of the BTCPE distribution into the definition completes the proof. ☐
Table 1 shows the first six moments of the BTCPE distribution and other useful measures, such as the standard deviation (SD), coefficients of variation (CV), skewness (CS) and kurtosis (CK). The SD, CV, CS and CK are, respectively, given by
and
Table 1.
Values of moment measures, including the SD, CV, CS and CK.
From Table 1, the CS is negative for the given parameter values and positive for others. It can be seen that the BTCPE distribution can be leptokurtic or platykurtic depending on the parameter values, since the CK can be lower than 3 or greater than 3, respectively. The coefficient of skewness also reveals that the BTCPE distribution can model both left and right-skewed data.
3.4. Order Statistics
Order statistics play an imperative role in both statistics and industrial reliability analysis. They can be used to estimate the minimum, maximum, and range of observations. They are used in developing control charts that are useful in industrial quality control analyses. Let be order statistics from BTCPE random variables. Then, the PDF of is given by
where
Using the binomial expansion , we can write
Thus, we have
On the other side, the CDF of is simply given by
and the CDF of is derived as
The distribution of the smallest order statistic represents the lifetime of a system connected in series, and that of the maximum order statistic denotes the lifetime of a system connected in parallel. Hence, they are vital in studying the minimum and maximum time to failure of components in engineering reliability. The minimum and maximum (min-max) plots of the order statistics can be used to investigate the distributional behavior of observations. The min–max plot captures not only the information in the tails but all the information about the whole distribution. The min–max plots shown in Figure 4 for some parameter values depend on and . From the min–max plots, the distribution can exhibit symmetrical, left-skewed, and right-skewed shapes.
Figure 4.
Min–max plots for some parameter values.
4. Bivariate Extension
Researchers may be interested in modeling the dependence between two (quantitative) measures in a dataset. For instance, one may be interested in modeling the relationship between age and the body mass index of individuals. Bivariate distributions have been used in reliability analysis, queuing theory, finance, and insurance risk analysis, among others, to study interdependency (see [26]). In this section, the bivariate extension of the BTCPE (BEBTCPE) distribution is proposed following the strategy developed in [26,27]. Given a bivariate continuous random vector , the CDF of the BEBTCPE distribution with parameters , where , and , is given by
where . The parameters and quantify the dependence between the two variables of a BEBTCPE random vector. The plots of the CDF for the following parameter values are shown in Figure 5:
Figure 5.
CDF plots of the BEBTCPE distribution.
- (a)
- ;
- (b)
- and
- (c)
- .
We notice various concave and convex shapes from these plots.
The corresponding bivariate PDF is given by
Figure 6 shows the bivariate PDF plots of the BEBTCPE distribution for the following parameter values:
Figure 6.
PDF plots of the BEBTCPE distribution.
- (a)
- ;
- (b)
- and
- (c)
- .
The first graph displays a non-monotonic shape whereas the other two exhibit monotonic shapes, illustrating the versatility in the bivariate modeling sense.
5. Parameter Estimation Methods
This section presents nine estimation methods for estimating the parameters of the BTCPE distribution. These include the maximum likelihood (ML) estimation (MLE), ordinary least squares (OLS), weighted least squares (WLS), Cramér–von Mises (CVM), percentile (PC) estimation, Anderson–Darling (AD) methods, and maximum and minimum product spacing methods.
5.1. Maximum Likelihood Estimation
One of the most common methods used for estimating the parameters of a developed model is the MLE method. Suppose that follows the BTCPE distribution, with as the parameter vector. For a single observation of , the log-likelihood function is
To obtain the estimates of the parameters for the single observation, the first partial derivative of Equation (14) with respect to the parameters needs to be derive. Here, we obtain
and
Given that are (independent and identically) observations from BTCPE random variables, then the total log-likelihood function is given by , where is defined in Equation (14) with . The estimates of the parameters can be obtained by maximizing the total log-likelihood function directly using MATLAB, MATHEMATICA and R software. In this study, the R software is used [28]. Alternatively, the estimates of the parameters can be obtained by equating the first partial derivatives with respect to the parameters to zero and solving the resulting system of equations simultaneously. However, since the resulting system of equations does not have a closed form, the nonlinear system of equations is solved numerically to obtain the estimates of the parameters.
5.2. Ordinary and Weighted Least Squares Estimation
Suppose that are ordered observations from BTCPE random variables. The OLS estimates of the parameters and are obtained by minimizing the following function:
with respect to the parameters and . On the other hand, the OLS estimates can be obtained by numerically solving the following nonlinear equations:
where
and
The WLS estimates and are obtained by minimizing the following function:
with respect to the parameters and . Alternatively, the WLS estimates can be obtained by numerically solving the following nonlinear equations:
where are defined above.
5.3. Cramér–Von Mises Estimation
Let be ordered observations from BTCPE random variables. The CVM estimates of the parameters and are obtained by minimizing the following function:
with respect to the parameters and . The estimates of the parameters can also be obtained by numerically solving the following equations:
where are given above.
5.4. Anderson–Darling Estimation
Another minimum distance estimation method is the AD estimation technique. Let be ordered observations from BTCPE random variables. The AD estimates for the parameters of the BTCPE distribution are obtained by minimizing the following function:
with respect to the parameters and .
5.5. Percentile Estimation
The PC estimation approach is another method of estimating the parameters of a given model. Let be ordered observations from BTCPE random variables and be an unbiased estimate of . The PC estimates of the parameters of the BTCPE distribution are obtained by minimizing the following function:
with respect to the parameters and .
5.6. Maximum and Minimum Product Spacing Estimation
An alternative parameter estimation technique which is based on the Kullback–Leibler information measure is the maximum product spacing (MPS). Let be ordered observations from BTCPE random variables. Consider the uniform spacing
where and . The estimates of the parameters are obtained via the MPS approach by maximizing the logarithm of the geometric mean of the spacing defined by
with respect to the parameters and .
Additionally, the minimum spacing distance (MSD) estimates for the parameters and are obtained by minimizing the following function:
where is an appropriate distance, with respect to the parameters and . Although different choices of exist, in this study the absolute distance and the absolute-log distance are utilized. Thus, the minimum spacing absolute distance (MSAD) and minimum spacing absolute-log distance (MSALD) estimates are, respectively, obtained by minimizing the following functions:
and
where and .
6. Simulation
In this section, simulation experiments are carried out to assess how well the proposed parameters of the BTCPE distribution have been estimated. The experiments are carried out with the following two different parameter combinations: and . The experiments are replicated 5000 times with the following different sample sizes: and 225. The bias (AB) and root mean square error (RMSE) of the estimates are then computed and compared.
The AB and RMSE are, respectively, computed using
and
where is either or and R = 5000 is used in this study.
From Table 2 and Table 3, most of the estimates have their ABs and RMSEs decreasing as the sample size increases. This is an indication that most of the estimates exhibit the consistency property. From Table 2, it can be observed that for sample sizes 25, 75 and 125 the PC estimate is the best for and, for the sample sizes 175 and 225, the MLE is the best for . For the parameter , the PC estimate is the best for the sample size 25 and the MLE is the best for 75, 125, 175 and 225. In Table 3, for sample sizes 25 and 75 the AD estimate is the best for the parameter and the MLE is the best for 125, 175 and 225. For the parameter , the MLE is the best for sample sizes 25, 125, 175 and 225. The AD estimate is best for when the sample size is 75.
Table 2.
AB and RMSE for and .
Table 3.
AB and RMSE for and .
7. Applications
Three applications of the BTCPE distribution are illustrated in this section, and its performance is compared to other competitive distributions defined in the unit interval. The performance of the BTCPE distribution was compared with that of the beta, unit Burr-III (UBIII) [29], bounded M-O extended exponential (BMOEE) [30], unit Gompertz (UG) [18], unit Lindley (UL) [17], unit Weibull (UW) [20] and unit-improved second-degree Lindley (UISDL) [31] distributions. The Akaike information criterion (AIC), Bayesian information criterion (BIC), Anderson–Darling (AD) test, and Cramér–von Mises (CVM) are the model selection techniques employed in arriving at the best model. For these selection techniques, the best model is the one with the smallest test statistic. The datasets represent the mortality rate of COVID-19 patients in Canada and the United Kingdom (UK), and the recovery rate of COVID-19 patients in Spain. The first two datasets were recently reported by [8].
The first dataset is the mortality rate for UK from 1 December 2020 to 29 January 2021. The data are: 0.1292, 0.3805, 0.4049, 0.2564, 0.3091, 0.2413, 0.1390, 0.1127, 0.3547, 0.3126, 0.2991, 0.2428, 0.2942, 0.0807, 0.1285, 0.2775, 0.3311, 0.2825, 0.2559, 0.2756, 0.1652, 0.1072, 0.3383, 0.3575, 0.2708, 0.2649, 0.0961, 0.1565, 0.1580, 0.1981, 0.4154, 0.3990, 0.2483, 0.1762, 0.1760, 0.1543, 0.3238, 0.3771, 0.4132, 0.4602, 0.352, 0.1882, 0.1742, 0.4033, 0.4999, 0.3930, 0.3963, 0.3960, 0.2029, 0.1791, 0.4768, 0.5331, 0.3739, 0.4015, 0.3828, 0.1718, 0.1657, 0.4542, 0.4772, 0.3402.
The second dataset denotes the mortality rate for Canada from 1 November to 26 December 2020. The data are: 0.1622, 0.1159, 0.1897, 0.1260, 0.3025, 0.2190, 0.2075, 0.2241, 0.2163, 0.1262, 0.1627, 0.2591, 0.1989, 0.3053, 0.2170, 0.2241, 0.2174, 0.2541, 0.1997, 0.3333, 0.2594, 0.2230, 0.2290, 0.1536, 0.2024, 0.2931, 0.2739, 0.2607, 0.2736, 0.2323, 0.1563, 0.2677, 0.2181, 0.3019, 0.2136, 0.2281, 0.2346, 0.1888, 0.2729, 0.2162, 0.2746, 0.2936, 0.3259, 0.2242, 0.1810, 0.2679, 0.2296, 0.2992, 0.2464, 0.2576, 0.2338, 0.1499, 0.2075, 0.1834, 0.3347, 0.2362.
The third dataset constitutes the recovery rates of COVID-19 patients in Spain from 3 March to 7 May 2020. The dataset can be found in [1] and are: 0.6670, 0.5000, 0.5000, 0.4286, 0.7500, 0.6531, 0.5161, 0.7895, 0.7689, 0.6873, 0.5200, 0.7251, 0.6375, 0.6078, 0.6289, 0.5712, 0.5923, 0.6061, 0.5924, 0.5921, 0.5592, 0.5954, 0.6164, 0.6455, 0.6725, 0.6838, 0.6850, 0.6947, 0.7210, 0.7315, 0.7412, 0.7508, 0.7519, 0.7547, 0.7645, 0.7715, 0.7759, 0.7807, 0.7838, 0.7847, 0.7871, 0.7902, 0.7934, 0.7913, 0.7962, 0.7971, 0.7977, 0.8007, 0.8038, 0.8289, 0.8322, 0.8354, 0.8371, 0.8387, 0.8456, 0.8490,0.8535, 0.8547, 0.8564, 0.8580, 0.8604, 0.8628, 0.6586, 0.7070, 0.7963, 0.8516.
The ML estimates of the parameters are estimated using the bbmle package in R [32]. The initial values of the parameters of the fitted distributions used for the optimization are obtained using the GenSA package in R [33]. Table 4 displays the descriptive statistics for COVID-19 mortality for the UK and Canada, as well as the recovery rate for Spain. The datasets are platykurtic due to the negative excess kurtosis. The UK mortality is right-skewed and that of Canada is left-skewed. The recovery rate for Spain is also left-skewed. This is affirmed by the boxplot of the datasets shown in Figure 7.
Table 4.
Descriptive statistics for datasets.
Figure 7.
Boxplots of COVID-19 datasets.
7.1. UK COVID-19 Mortality
Table 5 presents ML estimates of the parameters and their corresponding standard errors in brackets, the log-likelihood (), AIC, BIC, AD, and CVM for the fitted distributions. Given that it has the lowest values for the AIC, BIC, AD, and CVM and the maximum log-likelihood, the BTCPE distribution offers the best fit to the UK mortality dataset.
Table 5.
Parameter estimates and model selection criteria for UK.
Figure 8 displays the empirical and fitted PDFs and CDFs of the various distributions used to model the UK mortality dataset. The figure gives an indication that the BTCPE distribution provides a good fit to the dataset compared to the other models.
Figure 8.
Empirical and fitted PDFs (left) and CDFs (right) of UK dataset.
Figure 9 is the probability–probability (P-P) plots of the fitted distributions. Figure 9 once more shows that the BTCPE distribution fits the UK drought mortality well because the points cluster along the diagonal.
Figure 9.
P-P plots for UK drought mortality.
The profile log-likelihood plots for the estimated parameter values of the BTCPE distribution for the UK mortality data are shown in Figure 10. From the plots, it can be observed that the estimated values are the true maxima.
Figure 10.
Profile log-likelihood plots for estimated parameters of BTCPE for UK.
7.2. Canada COVID-19 Mortality
Table 6 presents ML estimates of the parameters and their corresponding standard errors in brackets and model selection criteria for the fitted distributions. The BTCPE distribution again provides the best fit to the Canada mortality dataset since it has the highest log-likelihood and the lowest values of the AIC, BIC, AD, and CVM.
Table 6.
Parameter estimates and model selection criteria for Canada.
Figure 11 shows the empirical and fitted PDFs and CDFs of the various distributions used to model the Canada drought mortality dataset. The figure gives an indication that the BTCPE distribution provides a better fit to the drought mortality for Canada than the other models, as it mimics the empirical PDF and CDF of the dataset better than the other models.
Figure 11.
Empirical and fitted PDFs (left) and CDFs (right) of Canada dataset.
Figure 12 shows the P-P plots of the fitted models. Figure 12 gives an indication that the BTCPE distribution provides a good fit to the Canada mortality as the points cluster along the diagonal.
Figure 12.
P-P plots for Canada mortality.
Figure 13 displays the profile log-likelihood plots for the estimated parameter values of the BTCPE distribution for the Canada mortality data. It can be observed from the plots that the estimates are unique and represent the true maxima.
Figure 13.
Profile log-likelihood plots for estimated parameters of BTCPE for Canada.
7.3. Spain COVID-19 Recovery Rate
The ML estimates of the parameters and their corresponding standard errors in brackets and model selection criteria for the fitted distributions are shown in Table 7. Because it has the lowest values for the AIC, BIC, AD, and CVM and the maximum log-likelihood, the BTCPE distribution again offers the best fit to the Spain recovery rate dataset.
Table 7.
Parameter estimates and model selection criteria for Canada.
The empirical and fitted PDFs and CDFs of the various distributions used to model the Spain recovery rate dataset are shown in Figure 14. It can be seen that the BTCPE distribution provides a better fit to the recovery rate data than the other models.
Figure 14.
Empirical and fitted PDFs (left) and CDFs (right) of Spain dataset.
The P-P plots of the fitted models for the recovery rate data are displayed in Figure 15. The plots indicate that the BTCPE distribution provides a good fit to the recovery rate data as the points cluster along the diagonal.
Figure 15.
P-P plots for Spain recovery data.
The profile log-likelihood plots for the estimated parameter values of the BTCPE distribution for the recovery rate data are shown in Figure 16. The plots suggest that the estimates are unique and represent the true maxima.
Figure 16.
Profile log-likelihood plots for estimated parameters of BTCPE for Spain.
8. Quantile Regression
When the response variable defined in the unit interval is skewed or contaminated with outliers, the beta regression model, which models the conditional mean of the response variable, is no longer reliable. A robust regression model is needed to model the effects of the covariates on the response variable. In this study, a quantile regression model is proposed for modeling the conditional quantile of the response variable. Given the quantile function of the BTCPE distribution, the PDF of the BTCPE distribution can be re-parameterized in terms of its quantile as . If , then the re-parameterized PDF is
The parameter is the quantile parameter. The BTCPE quantile regression is defined as
where is the vector of unknown parameters, is the quantile parameter and are the known vector of covariates. The link function is used to link the covariates to the conditional median of the dependent variable . The logit link function is used to link the covariates to the conditional quantile since . Hence, we have
Further, we can write
Substituting into the re-parameterized PDF, the log-likelihood for estimating the parameters of the BTCPE quantile regression is given by
where . The estimates of the parameters of the regression equation are obtained by directly maximizing the log-likelihood function. They will be denoted as and of and , respectively.
8.1. Residual Analysis
Model diagnostics are very essential when fitting a model to a dataset. Often, the behavior of the model residuals is examined to see if the model really provides a good fit to the data. In this study, the randomized quantile residuals are used to assess the adequacy of the regression model. The randomized quantile residuals are defined as
where is the quantile of the standard normal distribution. The randomized quantile residuals are expected to be distributed as the standard normal distribution if the models provide a good fit to the data.
8.2. Monte Carlo Simulation for Quantile Regression
Monte Carlo simulations are carried out in this section to examine the performance of the ML estimates of the parameters of the BTCPE regression model. The exercise is performed with two covariates. The following regression structure is adopted for the simulation:
The observations for the response variable are generated from the BTCPE distribution using sample sizes and 700. The experiments were repeated 5000 times for each sample size. The performance of the ML estimates is examined using AB and RMSE. The simulations were carried out using the median, . The following parameter combinations were used in the simulation: and . From the simulation results shown in Table 8, the ABs and RMSEs of the estimates’ decrease as the sample size increases. Hence, the ML estimates for the BTCPE regression parameters are consistent.
Table 8.
Simulation results for the quantile regression.
8.3. Application
The application of the quantile regression model is demonstrated in this section using a real dataset. The data are taken from [34] and are also available at http://www.leg.ufpr.br/doku.php/publications:papercompanions:multquasibeta (accessed on 30 August 2022). The data consist of body fat percentage (response variable) measured in five regions: android, arms, gynoids, legs and trunk. The data are comprised of 298 observations and the independent variables are: age (in years), body mass index (in kg/m2), sex (female or male) and IPAQ (sedentary (S), insufficiently active (I), or active (A)). In this study, the response variable body fat percentage at arms is regressed on age (), body mass index () and sex (, 0 for female and 1 for male). The response variable is regressed on the covariates using the relationship . Table 9 presents ML estimates, standard errors, and p-values for the parameters of the fitted models for the different quantiles. The estimates are all significant at the 5% level of significance.
Table 9.
ML estimates for quantile regression.
Table 10 presents the model selection criteria for the different quantiles. It is observed that the 0.90th quantile provides the best fit for the data as it has the least values of the model selection criteria.
Table 10.
Model selection criteria for quantile regression.
Figure 17 shows the rate of change of the regression coefficients for the different quantile levels and the corresponding 95% confidence interval (CI). It can be observed that all the coefficients approach zero as the quantile level increases, suggesting that they are more important in explaining smaller quantiles.
Figure 17.
Rate of change of regression coefficients for different quantiles.
Figure 18 and Figure 19 show the P-P plots and half-normal plots with simulated envelopes, respectively, for the randomized quantile residuals. These figures display good fits of the BTCPE quantile regression model to the percentage of body fat in arms for .
Figure 18.
P-P plots for randomized quantile residuals.
Figure 19.
Half-normal plots with simulated envelopes for randomized quantile residuals.
9. Conclusions
In this study, the BTCPE distribution is proposed for modeling datasets that are defined on the unit interval. The PDF of this distribution exhibits left-skewed, right-skewed, reversed J, and approximately symmetric shapes. The HRF displays increasing and bathtub shapes. This makes the distribution a suitable candidate for modeling datasets that exhibit these traits. Nine estimation methods were proposed for estimating the parameters of the distribution, and simulation results revealed that most of these estimates were consistent when it came to the estimation of the parameters of the distribution. The applications of the BTCPE distribution were illustrated using datasets on the mortality rate and recovery rates of COVID-19. The results revealed that for the three datasets, the BTCPE model provided a better fit than the other competing models. A quantile regression model for studying the relationship between the conditional quantiles of a bounded response variable and a set of covariates was proposed. The application of the regression model was illustrated using real data. The study only defined the cumulative distribution and probability density functions of the bivariate distribution. Our future research will study the detailed properties of the bivariate distribution, estimate its parameters, and illustrate its applications.
Author Contributions
Conceptualization, S.N., A.G.A., and C.C.; Data curation, S.N., A.G.A., and C.C.; Methodology, S.N., A.G.A., and C.C.; Supervision, S.N., and C.C.; Validation, S.N., and C.C.; Visualization, S.N., and A.G.A.; Writing, S.N., and A.G.A.; Review & editing, S.N., and C.C. All authors read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Afify, A.Z.; Nassar, M.; Kumar, D.; Cordeiro, G.M. A new unit distribution: Properties and applications. Electron. J. Appl. Stat. Anal. 2022, 15, 460–484. [Google Scholar]
- Almazah, M.M.A.; Ullah, K.; Hussam, E.; Hossain, M.; Aldallal, R.; Riad, F.H. New Statistical Approaches for Modeling the COVID-19 Data Set: A Case Study in the Medical Sector. Complexity 2022, 2022, 1325825. [Google Scholar] [CrossRef]
- Alahmadi, A.A.; Alqawba, M.; Almutiry, W.; Shawki, A.W.; Alrajhi, S.; Al-Marzouki, S.; Elgarhy, M. A New Version of Weighted Weibull Distribution: Modelling to COVID-19 Data. Discret. Dyn. Nat. Soc. 2022, 2022, 3994361. [Google Scholar] [CrossRef]
- Algarni, A.; Almarashi, A.M.; Elbatal, I.; Hassan, A.S.; Almetwally, E.M.; Daghistani, A.M.; Elgarhy, M. Type I Half Logistic Burr X-G Family: Properties, Bayesian, and Non-Bayesian Estimation under Censored Samples and Applications to COVID-19 Data. Math. Probl. Eng. 2021, 2021, 5461130. [Google Scholar] [CrossRef]
- Bantan, R.A.R.; Shafiq, S.; Tahir, M.H.; Elhassanein, A.; Jamal, F.; Almutiry, W.; Elgarhy, M. Statistical Analysis of COVID-19 Data: Using a New Univariate and Bivariate Statistical Model. J. Funct. Spaces 2022, 2022, 2851352. [Google Scholar] [CrossRef]
- Arif, M.; Khan, D.M.; Aamir, M.; Khalil, U.; Bantan, R.A.R.; Elgarhy, M. Modeling COVID-19 Data with a Novel Extended Exponentiated Class of Distributions. J. Math. 2022, 2022, 1908161. [Google Scholar] [CrossRef]
- Nagy, M.; Almetwally, E.M.; Gemeay, A.M.; Mohammed, H.S.; Jawa, T.M.; Sayed-Ahmed, N.; Muse, A.H. The New Novel Discrete Distribution with Application on COVID-19 Mortality Numbers in Kingdom of Saudi Arabia and Latvia. Complexity 2021, 2021, 7192833. [Google Scholar] [CrossRef]
- Almetwally, E.M. The Odd Weibull Inverse Topp–Leone Distribution with Applications to COVID-19 Data. Ann. Data Sci. 2021, 9, 121–140. [Google Scholar] [CrossRef]
- Muse, A.H.; Tolba, A.H.; Fayad, E.; Abu Ali, O.A.; Nagy, M.; Yusuf, M. Modelling the COVID-19 Mortality Rate with a New Versatile Modification of the Log-Logistic Distribution. Comput. Intell. Neurosci. 2021, 2021, 8640794. [Google Scholar] [CrossRef]
- Haq, M.A.U.; Babar, A.; Hashmi, S.; Alghamdi, A.S.; Afify, A.Z. The Discrete Type-II Half-Logistic Exponential Distribution with Applications to COVID-19 Data. Pak. J. Stat. Oper. Res. 2021, 17, 921–932. [Google Scholar] [CrossRef]
- Gündüz, S.; Korkmaz, M. A New Unit Distribution Based on the Unbounded Johnson Distribution Rule: The Unit Johnson SU Distribution. Pak. J. Stat. Oper. Res. 2020, 16, 471–490. [Google Scholar] [CrossRef]
- Bantan, R.; Jamal, F.; Chesneau, C.; Elgarhy, M. Theory and Applications of the Unit Gamma/Gompertz Distribution. Mathematics 2021, 9, 1850. [Google Scholar] [CrossRef]
- Nasiru, S.; Abubakari, A.G.; Angbing, I.D. Bounded Odd Inverse Pareto Exponential Distribution: Properties, Estimation, and Regression. Int. J. Math. Math. Sci. 2021, 2021, 9955657. [Google Scholar] [CrossRef]
- Jodrá, P. A bounded distribution derived from the shifted Gompertz law. J. King Saud Univ.-Sci. 2020, 32, 523–536. [Google Scholar] [CrossRef]
- Haq, M.A.U.; Hashmi, S.; Aidi, K.; Ramos, P.L.; Louzada, F. Unit Modified Burr-III Distribution: Estimation, Characterizations and Validation Test. Ann. Data Sci. 2020, 99, 1–26. [Google Scholar] [CrossRef]
- Korkmaz, M.Ç. The unit generalized half normal distribution: A new bounded distribution with inference and application. U. P. B. Sci. Bull. Ser. A 2020, 82, 133–140. [Google Scholar]
- Mazucheli, J.; Menezes, A.F.B.; Chakraborty, S. On the one parameter unit-Lindley distribution and its associated regression model for proportion data. J. Appl. Stat. 2019, 46, 700–714. [Google Scholar] [CrossRef]
- Mazucheli, J.; Menezes, A.F.; Dey, S. Unit-Gompertz distribution with applications. Statistica 2019, 79, 25–43. [Google Scholar]
- Korkmaz, M. A new heavy-tailed distribution defined on the bounded interval. J. Appl. Stat. 2019, 47, 2097–2119. [Google Scholar] [CrossRef]
- Mazucheli, J.; Menezes, A.F.; Ghitany, M.E. The unit Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
- Ghitany, M.E.; Mazucheli, J.; Menezes, A.F.B.; Alqallaf, F. The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Commun. Stat.-Theory Methods 2018, 48, 3423–3438. [Google Scholar] [CrossRef]
- Aldahlan, M.A.; Jamal, F.; Chesneau, C.; Elgarhy, M.; Elbatal, I. The Truncated Cauchy Power Family of Distributions with Inference and Applications. Entropy 2020, 22, 346. [Google Scholar] [CrossRef] [PubMed]
- Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Wiley: New York, NY, USA, 2007. [Google Scholar]
- MacGillivray, H.L. Skewness and Asymmetry: Measures and Orderings. Ann. Stat. 1986, 14, 994–1011. [Google Scholar] [CrossRef]
- Moors, J.J. A quantile alternative for kurtosis. J. R. Stat. Soc. Ser. D 1988, 37, 25–32. [Google Scholar] [CrossRef]
- Elhassanein, A. On Statistical Properties of a New Bivariate Modified Lindley Distribution with an Application to Financial Data. Complexity 2022, 2022, 2328831. [Google Scholar] [CrossRef]
- Ganji, M.; Bevrani, H.; Hami, N. A New Method For Generating Continuous Bivariate Families. J. Iran. Stat. Soc. 2018, 17, 109–129. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019; Available online: https://www.R-project.org/ (accessed on 30 October 2022).
- Modi, K.; Gill, V. Unit Burr-III distribution with application. J. Stat. Manag. Syst. 2019, 23, 579–592. [Google Scholar] [CrossRef]
- Ghosh, I.; Dey, S.; Kumar, D. Bounded M-O Extended Exponential Distribution with Applications. Stoch. Qual. Control 2019, 34, 35–51. [Google Scholar] [CrossRef]
- Altun, E.; Cordeiro, G.M. The unit-improved second-degree Lindley distribution: Inference and regression modeling. Comput. Stat. 2019, 35, 259–279. [Google Scholar] [CrossRef]
- Bolker, B. Tools for General Maximum Likelihood Estimation; R Development Core Team: Vienna, Austria, 2014. Available online: https://github.com/bbolker/bbmle (accessed on 30 October 2022).
- Xiang, Y.; Gubian, S.; Suomela, B.; Hoeng, J. Generalized simulated annealing: GenSA package. R J. 2013, 5, 13–29. [Google Scholar] [CrossRef]
- Petterle, R.R.; Bonat, W.H.; Scarpin, C.T.; Jonasson, T.; Borba, V.Z.C. Multivariate quasi-beta regression models for continuous bounded data. Int. J. Biostat. 2020, 17, 39–53. [Google Scholar] [CrossRef] [PubMed]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).