Abstract
An efficient method such as ranked set sampling is used for estimating the population parameters when the actual observation measurement is expensive and complicated. In this paper, we consider the problem of estimating the two-parameter xgamma (TPXG) distribution parameters under the ranked set sampling as well as the simple random sampling design. Various estimation methods, including the weighted least-square estimator, maximum likelihood estimators, least-square estimator, Cramer–von Mises, the maximum product of spacings estimators, and Anderson–Darling estimators, are considered. A comparison between the ranked set sampling and simple random sampling estimators, with the same number of measurement units, is conducted using a simulation study in terms of the bias, mean squared errors, and efficiency of estimators. The merit of the ranked set sampling estimators is examined using real data of bank customers. The results indicate that estimations using the ranked set sampling method are more efficient than the simple random sampling competitor considered in this study.
Keywords:
simple random sampling; xgamma distribution; weighted least squares; method of maximum product of spacings; ranked set sampling MSC:
62F10; 62G30; 62G20
1. Introduction
A random variable X follows the xgamma distribution if its probability density function (pdf) is given by
and its cumulative distribution function (CDF) is
Plots of the pdf of the xgamma distribution are presented in Figure 1 for some values of .
Figure 1.
The pdfs plots of the xgamma distribution for different values of the .
As an extension to the xgamma distribution, the two-parameter xgamma distribution is proposed as a new distribution by [1] by using an additional parameter to the xgamma distribution to obtain a more flexible distribution in modeling real data sets due to the wide use of the xgamma distribution in several survival analyses. When a random variable, X, follows the TPXG distribution, the probability density function and the cumulative distribution function are, respectively, given by
For in (3), we obtain the xgamma distribution with parameter as a special case of the TPXG distribution.
The rth moment for the distribution is obtained by
The characteristic function (CF) and hazard function of the model are, respectively, given by
Figure 2 represents some possible pdf shapes of the TPXG distribution for selected values of and , which reveals the flexibility of the distribution in modeling right-skewed observations. Further, Figure 3 indicates the possible shapes of the function . They are bathtub, increasing, decreasing, and decreasing-increasing shapes. For more explanations regarding the TPXG distribution, see [1].
Figure 2.
The pdfs plots of the TPXG distribution for different values of the parameters.
Figure 3.
Plots of of the TPXG distribution for (a) , (b) , (c) , (d) , (e) , (f) .
When the variable of interest is expensive to measure or difficult to obtain, but cheap and simple to rank, ranked set sampling is recognized as an effective sampling strategy for enhancing the accuracy and efficiency of parameters estimation. McIntyre [2] proposed the ranked set sampling scheme for estimating the pasture and forage yields.
Let TPXG distribution, with the pdfs and CDF , where and represent, respectively, the population mean and variance. Let the random sample , , ⋯, with the same pdf . The method of the ranked set sampling (RSS) can be described as follows:
- Choose k simple random sampling (SRS) each of size k (set size) from the underlying population as
, , …, . , , …, . ⋮ ⋮ ⋮ ⋮ , , …, - Rank the units in each set of size k from lowest to the largest visually or based on any cost-free method as
, , …, . , , …, . ⋮ ⋮ ⋮ ⋮ , , …, - Select the ith order statistic (in bold) from the ith set (, 2, …, k) as
, , …, . , , …, . ⋮ ⋮ ⋮ ⋮ , , …, . - Repeat the Steps (1)–(3), n times (cycles) to obtain an RSS of size .
The selected RSS units are denoted by , where is the ith largest unit in a set of size k in the jth cycle. Notice that even we selected units, we only measured k of them; these units are not identically distributed, but they are independent because they are selected from different sets.
Takahasi and Wakimoto [3] delivered the mathematical theory of the RSS, and showed that the RSS estimator of mean with the perfect ranking is unbiased and better than the SRS estimator due to its smaller variance. The SRS mean estimator is given by
The RSS estimator of the population mean with its variance are given by
Note that since , we have
They also showed that
where
Under perfect rankings, this relation emphasizes the efficiency of the mean estimator due to its variance compared to for the SRS estimator for the same number of quantified observations regardless of the distribution of nature. Even with a ranking error, Dell and Clutter [4] demonstrated that RSS is more effective than simple random sampling.
Some further modifications of RSS are suggested in the literature, such as extreme RSS by Samawi et al. [5], Mutllak [6] introduced a modification of RSS called median ranked set sampling; another scheme of RSS is proposed by Al-Saleh and Al-Kadiri [7] which is the double RSS, percentile RSS by Mutllak [8], L RSS by Al-Nasser [9], Haq et al. [10] suggested partial RSS design, and neoteric RSS by Zamanzade and Al-Omari [11]. In addition to these modifications, many authors investigated the parameter estimation of some distributions using RSS or its modifications. For example, the logistic model parameters are estimated based on SRS and RSS by Abu-Dayyeh et al. [12]. The generalized quasi-Lindley distribution parameter estimation is considered by Al-Omari et al. [13]. Yousef and Al-Subh [14] used maximum likelihood methods to estimate Gumbel parameters under RSS. Akgul and Şenoglu [15] investigated some modifications of the RSS in estimating the Weibull distribution parameters. The Bayesian and maximum likelihood estimation approaches are considered by Hussian [16] to determine parameter estimates for the Kumaraswamy distribution under RSS. Chen et al. [17] used moving extremes RSS to estimate the scale parameter for the scale distribution. Al-Omari and Bouza [18] considered ratio estimators of the population mean with missing values using RSS. Later, Al-Omari [19] considered the varied L RSS and used the MLE in location-scale families. Hassan et al. [20] used median RSS and estimated the stress–strength reliability for the generalized inverted exponential distribution.
Due to the importance of the TPXG distribution in lifetime distributions and to our knowledge, this is the first study to consider the RSS design for parameter estimations of the TPXG distribution. Hence, the main focus of this paper is to use RSS design for estimating the TPXG distribution parameters and then use some well-known methods of estimation, including the method of maximum product of spacings, maximum likelihood method, ordinary least square method, method of Cramer and von Mises, weight least square method, and the Anderson–Darling method. Then, the suggested estimators based on the RSS design are compared with their competitors in SRS for the same number of measured observations. A real data set is analyzed to explain the usefulness of the offered estimators. Based on the gained results, the RSS estimators are found to be better than the SRS counterparts in terms of the MSE, bias, and efficiency values for all methods of estimation considered in the study.
The layout of this paper is as follows. The estimation methods of the TPXG distribution parameters are presented in Section 2. A simulation study is conducted to show the superiority of the RSS relative to the SRS estimators in Section 3. In Section 4, the suggested estimators’ usefulness is examined using a real data set fitted to the TPXG distribution. The last section will present the conclusion and remarks.
2. Method of Estimation
Here, based on RSS design, six estimation methods are considered to estimate the and parameters of the TPXG distribution, which are: the maximum likelihood (MLE) method, the maximum product of spacings (MPS) method, ordinary least square (OLS) method, weight least square (WLS) method, Cramer–von Mises (CV) method, and Anderson–Darling (AD) method. In all methods, we denote by k the ith order statistics from the ith set of size k of the jth cycle and take them to be the RSS data for X with sample size .
2.1. MLE Method
Considering an RSS sample of size , the likelihood function is obtained by
with
Let the log-likelihood function be
The and cannot be obtained explicitly and they are not in closed form. Hence, they should be solved numerically to find the MLEs, and of and , respectively.
2.2. Method of MPS
Cheng and Amin [21,22] introduced this method, which depends on maximizing the geometric mean of data spacings. Consider to be an ordered sample forming a RSS of size from the TPXG distribution. The uniform spacings are given by
Note that and . It is clear that
Let the geometric mean of the spacing be
The natural logarithm of (11) is
The estimators, and , are the values of and , which maximize the geometric mean of spacings. The determination of these estimators can be achieved by determining the solution of the following nonlinear equations:
where
and
that can be solved numerically.
2.3. Methods of LS
Well-known results in probability theory indicate that , where F is a distribution function, and are the ith-order statistic of the sample . Therefore, and .
Using the expectation and variance, two variants of the least squares methods can be obtained. Swain et al. [23] were the first to use the method of LS for parameter estimations of the beta distribution.
2.3.1. OLS Method
The OLS estimators and of and , respectively, can be found by minimizing the following function, with respect to and :
2.3.2. WLS Method
The WLS estimators of and , say, and , respectively, can be determined by minimizing the following function, with respect to and :
2.4. Methods of Minimum Distances
Several methods of estimation can be proposed based on the minimization of test statistics between the empirical cumulative distribution and theoretical functions. The Cramer–von Mises and Anderson–Darling methods are considered here. (See D’Agostino and Stephens [24]).
2.4.1. CV Method
The CV and of and , respectively, can be found by minimizing the following function, with respect to and :
2.4.2. AD Method
3. Simulation and Discussion
In this section, a simulation study is supplemented using R software. Based on different parameters and sample sizes for both designs RSS and SRS, 1000 samples from the TPXG distribution are generated. The simulation is performed assuming that the ranking in RSS design is perfect, i.e., there is no error in ranking. The number of cycles is nominated to be n = 10, 15, and 20, while the set size k is designated as 4, 5, and 6. The SRS size is , which must have the same size in the RSS design. For the purpose of comparison between SRS and RSS methods, the estimated mean squared error (MSE) and efficiency are deliberated for each estimator as:
where and denote the estimate of and , respectively, for the ith simulated sample. The efficiency (Eff) values of the RSS estimators with respect to the SRS estimators based on the same sample size are defined by
For various selections of the parameters, sample sizes and number of cycles, the estimates (ES), estimated MSE, and the Eff values are displayed in Table 1, Table 2, Table 3 and Table 4. The results in Table 1, Table 2, Table 3 and Table 4 indicate that:
Table 1.
The ESs, MSEs, and Effs values, using the MLE, MPS, and OLS methods for the TPXG model parameters with , .
Table 2.
The ESs, MSEs, and Effs values, using the WLS, CV, and AD methods for the TPXG model parameters with , .
Table 3.
The ESs, MSEs, and Effs values, using the MLE, MPS, and OLS methods for the TPXG model parameters with , .
Table 4.
The ESs, MSEs, and Effs values, using the WLS, CV, and AD methods for the TPXG model parameters with , .
- Most of the efficiency values are larger than 1 for all cases considered in this study, indicating that the RSS estimators perform better than the SRS estimators in all methods based on the same number of measured units;
- Using the RSS design with an increasing number of cycles, the MSE values decrease. As an example, when k = 5, the MSEs of the MLE estimators of are 0.0501, 0.0346, and 0.0270 for n = 10, 15, and 20, respectively;
- Based on the RSS method, the MSE value decreases as k value increases. For example, when n = 15, the MSEs of estimators of , using the AD method, are 0.4206, 0.2248, and 0.1693 for k = 4, 5, and 6, respectively;
- Under the SRS technique, the MSE value decreases when increases;
- The SRS estimators of perform better than the RSS estimators using the MPS method in some cases, otherwise, the RSS estimators are better;
- The MLE estimators accomplish superior performance compared to the other estimation methods under both RSS and SRS schemes for all results in the tables.
4. Application to Read Data
The usefulness of the proposed RSS estimators is examined in this section using a well-known real data set, which embodies the waiting times (in minutes) before service of 100 bank customers. These data were studied by Ghitany et al. [25]. The data observations are: 2.6, 2.7, 2.9, 3.1, 3.2, 3.3, 3.5, 3.6, 4.0, 4.1, 4.2, 4.2, 4.3, 4.3, 4.4, 4.4, 4.6, 4.7, 4.7, 4.8, 4.9, 4.9, 5.0, 0.8, 0.8, 1.3, 1.5, 1.8, 1.9, 1.9, 2.1, 5.3, 5.5, 5.7, 5.7, 6.1, 7.1, 7.1, 7.4, 7.6, 7.7, 8.0, 8.2, 8.6, 8.6, 8.6, 8.8, 8.8, 11.0, 11.1, 11.2, 6.2, 6.2, 6.2, 8.9, 8.9, 9.5, 9.6, 9.7, 9.8, 10.7, 10.9, 11.0, 6.3, 6.7, 6.9, 7.1, 7.1, 11.2, 11.5, 11.9, 12.4, 12.5, 12.9, 13.0, 13.1, 13.3, 17.3, 17.3, 18.1, 18.2, 18.4, 18.9, 19.0, 19.9, 20.6, 21.3, 21.4, 21.9, 23.0, 27.0, 31.6, 33.1, 13.6, 13.7, 13.9, 14.1, 15.4, 15.4, 38.5.
The TPXGD distribution is fitted to this data. We considered different criteria in this study, such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), Hannan Quinn Information Criterion (HQIC), Consistent Akaike Information Criterion (CAIC). Details of these criteria can be found in Akaike [26], and Schwarz [27], Hannan and Quinn [28] and Bozdogan [29]. Additionally, Kolmogorov–Smirnov (KS) is obtained for each model.
The formulae for these criteria are: AIC = , CAIC = , HQIC = , BIC = , and KS = , where h is the number of parameters and n is the sample size and L is the value of the maximum log-likelihood function.
Since the distribution under study has two parameters, for fitting the data, we considered two distributions of two parameters—Darna distribution and Marshall–Olkin Esscher transformed Laplace distribution—and one distribution of one parameter, the inverse length-biased Maxwell distribution. The pdfs of these distributions are mentioned below.
- Darna distribution with pdf:
- Marshall–Olkin Esscher transformed Laplace distribution (MOETL) with pdf:
- Inverse length-biased Maxwell distribution (ILBMD) with pdf:
The results are reported in Table 5. They show that the TPXG distribution provides a superior fit over other competing continuous models, since it has the smallest values for all measures with smallest values of the Kolmogorov–Smirnov distance; Figure 4 supports this claim.
Table 5.
The goodness-of-fit tests for data sets.
Figure 4.
Plots of estimated pdf and CDF for the bank customers data.
Total Time on the Test (TTT) plot plays a vital role in selecting the proper model for fitting the underlying data regarding the failure rates. This informs us of the altered forms of the model failure rate. If the plot has a straight line, then the given data have a constant failure rate. The failure rates will be decreased if it is convex and increased if this plot is concave. For the bathtub shape, the TTT plot decreases first and then increases. Whereas, if the TTT plot is concave first and then convex, the failure rates will have an inverted bathtub shape. The TTT and density plots for TPXG distribution for the bank customers’ data are given in Figure 5. The probability–probability (P-P) and quantile–quantile (Q-Q) plots for the TPXG model based on the real data are given in Figure 6. Figure 7 presents the box and Bee Swarm plots for these data.
Figure 5.
TTT and density plots for TPXG distribution for bank customer data.
Figure 6.
P-P and Q-Q plots for TPXG distribution for bank customers.
Figure 7.
Box and Bee Swarm plots for TPXG distribution for bank customers.
Based on these data, we take an SRS of size 20, while for the ranked set sampling, a small sample size of is considered with number of cycles as . Table 6 and Table 7 include the RSS (n = 4 and k = 5) and SRS () samples taken from the bank customers data. It is of interest to note here that the SRS and RSS methods are compared based on the same number of measured units. Using the previous methods, we calculate the estimates of and in each design. Here, we assumed that the ranking is perfect. To compare estimators, we considered the previous criteria measures, AIC, BIC, CAIC HQC, and KS. The results are summarized in Table 8.
Table 6.
RSS sample taken from the bank customers data for n = 4 and k = 5.
Table 7.
SRS sample taken form the data set of size 20.
Table 8.
Estimates, AIC, BIC, CAIC HQC, and KS in SRS and RSS design using MLE, MPS, OLS, WLS, CV, and AD.
The findings in Table 8 illustrate that the TPXG parameter estimates, based on the RSS method, are improved compared to their counterparts in SRS in terms of the smallest values of AIC, BIC, CAIC HQC, and KS, using the MLE, MPS, OLS, WLS, CV, and AD.
5. Conclusions
This paper discusses several estimation methods for the TPXG distribution parameters based on RSS and SRS designs. A simulation study is performed to compare the performance of these various estimators, considering the same number of measuring units. A real data set is analyzed to illustrate the usefulness of the suggested estimators. Based on the obtained results, the RSS estimators are better than the SRS estimators in terms of MSE, bias, and efficiency values for all estimation methods considered in the paper. For future work, the topics discussed in this paper can be considered under imperfect ranking (Dell and Clutter [4]) based on RSS or using some of its modifications. For some modifications of the RSS, see the balanced groups RSS by Jemain et al. [30], percentile DRSS by Al-Omari and Jaber [31], and new mixed RSS by Hanandeh et al. [32]. Furthermore, one can use other real data sets for additional investigation of the suggested estimators of the TPXG distribution.
Author Contributions
The authors contributed equally to this work. Formal analysis, S.B.; Validation, A.I.A.-O.; Writing—review & editing, A.I.A.-O. and I.M.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Deanship of Scientific Research at King Khalid University through the Research Groups Program under the grant number R.G.P. 2/132/43.
Data Availability Statement
The data used in this study are available within the paper.
Acknowledgments
The authors thank and extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Sen, S.; Maiti, S.S.; Chandra, N. The xgamma distribution: Statistical properties and application. J. Mod. Appl. Stat. Methods 2016, 15, 38. [Google Scholar] [CrossRef]
- McIntyre, G.A. A method for unbiased selective sampling using ranked set sampling. Aust. J. Agric. Res. 1952, 3, 385–390. [Google Scholar] [CrossRef]
- Takahasi, K.; Wakimoto, K. On unbiased estimates of the population mean based on the sample stratified by means of ordering. Ann. Inst. Stat. Math. 1968, 20, 1–31. [Google Scholar] [CrossRef]
- Dell, T.R.; Clutter, J.L. Ranked set sampling theory with order statistics background. Biometrics 1972, 28, 545–555. [Google Scholar] [CrossRef]
- Samawi, H.; Abu-Daayeh, H.A.; Ahmed, S. Estimating the population mean using extreme ranked set sampling. Biom. J. 1996, 38, 577–586. [Google Scholar] [CrossRef]
- Mutllak, H.A. Median ranked set sampling. J. Appl. Stat. Sci. 1997, 6, 245–255. [Google Scholar]
- Al-Saleh, M.F.; Al-Kadiri, M.A. Double-ranked set sampling. Stat. Probab. Lett. 2000, 48, 205–212. [Google Scholar] [CrossRef]
- Muttlak, H.A. Modified ranked set sampling methods. Pak. J. Stat.-All Ser. 2003, 19, 315–324. [Google Scholar] [CrossRef]
- Al-Nasser, A.D. L-Ranked set sampling: A generalization procedure for robust visual sampling. Commun. Stat. 2007, 6, 33–43. [Google Scholar] [CrossRef]
- Haq, A.; Brown, J.; Moltchanova, E.; Al-Omari, A.I. Partial ranked set sampling design. Environmetrics 2013, 24, 201–207. [Google Scholar] [CrossRef]
- Zamanzade, E.; Al-Omari, A.I. New ranked set sampling for estimating the population mean and variance. Hacet. J. Math. Stat. 2016, 45, 1891–1905. [Google Scholar] [CrossRef]
- Abu-Dayyeh, W.A.; Al-Subh, S.A.; Muttlak, H.A. Logistic parameters estimation using simple random sampling and ranked set sampling data. Appl. Math. Comput. 2004, 150, 543–554. [Google Scholar] [CrossRef]
- Al-Omari, A.I.; Benchiha, S.; Almanjahie, I.M. Efficient Estimation of the Generalized Quasi-Lindley Distribution Parameters under Ranked Set Sampling and Applications. Math. Probl. Eng. 2021, 1, 9982397. [Google Scholar] [CrossRef]
- Yousef, O.M.; Al-Subh, S.A. Estimation of Gumbel parameters under ranked set sampling. J. Mod. Appl. Stat. Methods 2014, 13, 24. [Google Scholar] [CrossRef]
- Akgul, F.G.; Şenoglu, B. Estimation of P (X< Y) using ranked set sampling for the Weibull distribution. Qual. Technol. Quant. Manag. 2017, 14, 296–309. [Google Scholar]
- Hussian, M.A. Bayesian and maximum likelihood estimation for Kumaraswamy distribution based on ranked set sampling. Am. J. Math. Stat. 2014, 4, 30–37. [Google Scholar]
- Chen, W.; Xie, M.; Wu, M. Parametric estimation for the scale parameter for scale distributions using moving extremes ranked set sampling. Stat. Probab. Lett. 2013, 83, 2060–2066. [Google Scholar] [CrossRef]
- Al-Omari, A.I.; Bouza, C.N. Ratio estimators of the population mean with missing values using ranked set sampling. Environmetrics 2015, 26, 67–76. [Google Scholar] [CrossRef]
- Al-Omari, A.I. Maximum likelihood estimation in location-scale families using varied L ranked set sampling. RAIRO-Oper. Res. 2021, 55, S2759–S2771. [Google Scholar] [CrossRef]
- Hassan, A.S.; Al-Omari, A.I.; Nagy, H.F. The Determination of the order of an autoregression. Iran. J. Sci. Technol. Trans. Sci. 2021, 45, 641–659. [Google Scholar] [CrossRef]
- Cheng, R.C.H.; Amin, N.A.K. Maximum product-of-spacings estimation with applications to the log-Normal distribution; Technical Report; Department of Mathematics, University of Wales: Cardiff, UK, 1979. [Google Scholar]
- Cheng, R.C.H.; Amin, N.A.K. Estimating parameters in continuous univariate distributions with a shifted origin. J. R. Stat. Soc. Ser. 1983, 45, 394–403. [Google Scholar] [CrossRef]
- Swain, J.; Venkatraman, S.; Wilson, J. Least squares estimation of distribution function in Johnson’s translation system. J. Stat. Comput. Simul. 1988, 29, 271–297. [Google Scholar] [CrossRef]
- D’Agostino, R.B.; Stephens, M.A. Goodness-of-Fit Techniques; Marcel Dekker: New York, NY, USA, 1986. [Google Scholar]
- Ghitany, M.E.; Atieh, B.; Nadarajah, S. Lindley distribution and its application. Math. Comput. Simul. 2008, 78, 493–506. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Hannan, E.J.; Quinn, B.G. The Determination of the order of an autoregression. J. R. Stat. Soc. Ser. B 1979, 41, 190–195. [Google Scholar] [CrossRef]
- Bozdogan, H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 1987, 52, 345–370. [Google Scholar] [CrossRef]
- Jemain, A.A.; Al-Omari, A.I.; Ibrahim, K. Balanced groups ranked set sampling for estimating the population median. J. Appl. Stat. 2008, 17, 39–46. [Google Scholar]
- Al-Omari, A.I.; Jaber, K. Percentile double ranked set sampling. J. Math. Stat. 2008, 4, 60–64. [Google Scholar] [CrossRef][Green Version]
- Hanandeh, A.A.; Al-Nasser, A.D.; Al-Omari, A.I. New mixed ranked set sampling variations. Stat 2022, 11, e417. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).