Abstract
The concept of length-biased distribution is applied in expending proper models for lifetime data. The length-biased distribution is a special case of well-known weighted distribution. In this article, we introduce a length-biased weighted Lomax distribution (LBWLD) with k presence of outliers and estimate the parameter of R = P(Y < X) when the random variables X and Y are independent and have LBWLD in presence of outliers and without outliers, respectively. The bias and mean square error (MSE) of the estimator are examined with simulations of numerical and bootstrap resampling. Analysis of a real data set is considered for illustrative purposes.
1. Introduction
In recent years, the Lomax distribution (Pareto distribution II), has been applied in the field of life testing because it fits the business failure data. The Lomax distribution is presented by Lomax [1]. In 2010, Abdullah and Abdullah [2] estimated the parameters of the Lomax distribution based on generalized probability weighted moments. The Lomax distribution was applied in the theoretical background in a number of ways. Weighted distribution theory gives a unified approach to deal with model specification and data interpretation problems. Weighted distributions are frequent in studies related to reliability, survival analysis, analysis of family data, biomedicine, ecology and several other areas; Stene [3] and Oluyede and George [4]. Gupta and Tripathi [5], Patil and Rao [6] and Das and Roy [7] researched the length-biased Weighted Generalized Rayleigh distribution with its properties. Moreover, they devised a length-biased Weighted Weibull distribution [8]. Many authors have researched important findings on weighted distributions. A unified concept of weighted distribution was proposed by Rao [9] where sampling situations example of being modeled are identified through weighted distributions. In the context of reliability, the life of a component which has a random strength X and is subject to a random stress Y is described by stress-strength models. The failed component at the instant means the stress is assigned to exceed the strength, and in case X > Y, the component functions properly. Consequently, R = P(Y < X) is a measure of component reliability. The parameter of R is the reliability parameter. This type of function can be of practical importance in many applications. For example, provided that X is the response for a control group, and Y is the treatment group, then R = P(Y < X), which is a measure of the effect of the treatment. This R = P(Y < X) can be applied when estimating heritability of a genetic trait. For more information on R, Halperin et al. [10], Simonoff et al. [11], Reiser and Farragi [12] and Bamber [13] are recommended. Bamber [13] provides a geometrical interpretation of A(X, Y) = P(X < Y) + P(X = Y) and reveals that A(X,Y) is a practical formula in measuring the size of the difference between two populations. Weerahandi and Johnson [14] suggested inferential procedures for P(X > Y) by considering that X and Y are independent normal random variables. Nadarajah [15] examined the reliability for Laplace distribution and its generalizations, where he proposes the double Lomax distribution and introduces its appliance to the stress-strength model. The double Lomax distribution is the ratio independent and analogously classical Laplace distributions. For given random variables X and Y, the distribution ratio X/Y is of absorption in biological and physical sciences, econometrics, and ranking and selection. The archetypes in the context consist of Mendelian inheritance ratios in genetics, mass to energy ratios in nuclear physics, target to control precipitation in meteorology, inventory ratios in economics, and the stress-strength model in the realm of reliability. The function R = P(X > Y) is of practical use in many cases like clinical trials, genetics, and reliability. Let X be a nonnegative random variable with its probability density function (PDF) of , and then the PDF of the weight random variable yields through
where
The random variable is named the weight version of X and its distribution corresponds to that of X and is named the weighted distribution with weight function w(x). Consequently, at w(x) = x, the obtained distribution is named a length-biased distribution and the PDF of a length-biased random variable X yields
1.1. Lomax Distribution and Properties
A random variable X is distributed as Lomax distribution (LD) with two parameters of θ (shape parameter) and (scale parameter) if it is of the following PDF
It is easy to show that
The corresponding cumulative distribution function (CDF) is exposed through
It is assumed that is a random sample of size n from LD, where the maximum likelihood estimators (MLEs) of the parameters of and , represented by and , are obtained through direct maximization of the log-likelihood function. Different log-likelihood functions with respect to parameters are achieved through normal equations
The MLEs are simultaneous solutions of the nonlinear normal Equation (1). So, by simplification of (1), we have
The estimators cannot be written in a closed-form expression.
1.2. LBWLD and Properties
An LBWLD is derived by weights of w(x) = x in the weighted Lomax distribution [16]
It is easy to show that
The corresponding CDF is
where and are the shape and scale parameters, respectively.
Let be a random sample of size m from LBWLD with two parameters of and . The log-likelihood function would be
The partial derivatives of with respect to and are, respectively
The MLE of is obtained with simultaneous solutions nonlinear system of normal equations, given by
2. Joint Distribution of Random Variables in Presence of k Outliers
According to Dixit [17], let random variables be such that k of them (the number of outliers) are distributed as LBWLD with parameters of (shape parameter) and (scale parameter) and the remaining (n-k) random variables are distributed as LBWLD with parameters of (shape parameter) and (scale parameter). The joint distribution of with k outliers is expressed as
where .
It is easy to show that marginal distribution of X is expressed as [18]
The CDF is
where and .
3. Method of Maximum Likelihood Estimation (MLE)
The MLEs of the parameters are achieved through direct maximization of the log-likelihood function. Partial derivations of log-likelihood functions with respect to parameters and setting the results equal to zero will produce normal equations. The MLE is the result of solving a nonlinear system of normal equations. It is more practical to apply nonlinear optimization algorithms like the quasi-Newton algorithm to maximize the log-likelihood function in its numerical sense.
Provided that is a random sample of size n from LBWLD with presence of k outliers, then the log-likelihood function is presented as
The partial derivatives of with respect to and are as
The MLE of is obtained as simultaneous solutions in a nonlinear system of normal equations, given by
Maximum Likelihood Estimator of R = P(Y < X)
Let the random variables X and Y be independent and Y be LBWLD to parameters of and the random variables are such that k of them are distributed as LBWLD to parameters of , and the remaining (n-k) random variables are distributed as LBWLD with parameters of . The parameters of are the shape parameters and parameter of is the scale parameter. Using the marginal distribution of X given by (3), we obtain
The value of R does not depend on parameter of .
If we insert k = 0, then X and Y will be LBWLD, without outliers, with different shape parameters and identical scale parameter. So
Suppose is a random sample for random variable Y from LBWLD to parameters of . The MLEs of the parameters of are yielded with simultaneous solutions in nonlinear system of normal equations, (2) and (4)–(6). If replaces into R, then they will obtain the MLE of the parameter of R.
4. Numerical Experiments
The Equation F(x) = u, where u is an observation as the uniform distribution (0, 1) and F(x) is the CDF of LBWLD with Presence of Outliers distributions, is applied in running simulation by generating random samples that follow LBWLD with Presence of Outliers distributions. The iterations of this simulation are 1000 per sample size, (m, n) = (10, 10), (20, 20), (50, 50), (10, 30), (10, 50), (30, 10), (50, 10), in two cases of known and unknown , for , and . The averages bias and the means squared error of estimations of parameters of , and R were computed for these simulations. The results have been truncated after four decimal places.
4.1. Bootstrap Procedure
We consider the bootstrap resampling method for deriving the bias-corrected MLEs based on the idea of Efron [19]. Let be a random sample of size n from the random variable X with distribution function F and be an estimator of parameter that is a function of F known. In this procedure, we generate a large number of pseudo-samples from the sample and compute the corresponding bootstrap replicates , say . Thus, the empirical distribution of is used to estimate the distribution function of . Therefore, we generate N bootstrap samples independently from the sample and compute the corresponding bootstrap estimates (. The second-order bias-corrected MLEs for can be obtained as , where [20].
4.2. Results
Table 1 and Table 2 present the values of the averages bias and the MSEs of the estimates for LBWLD with outliers in two cases of known and unknown and values of bootstrap method based on 500 resampling are reported within brackets. The values of MSEs of the estimations of , and often decrease while sample sizes n and m increase. Average biases and MSEs decrease by applying a bootstrap procedure. The sampling experiments and all computations were performed using R (V. 3.4.3) software.
Table 1.
Average Bias and mean square errors (MSEs) for (λ is known, λ = 2, α = 7, β = 5, θ = 4).
Table 2.
Average Bias and MSEs for (λ is unknown, λ = 2, α = 7, β = 5, θ = 4).
5. Application
In order to illustrate the usefulness of the proposed model, we consider data reported by King et al. [21]. A laboratory investigator interested in the relationship between diet and the development of tumors divided rats into two groups and fed them with saturated fat and unsaturated fat diets, respectively. The data represent the tumor-free time, the time from injection to the time that a tumor develops or to the end of the study. Group One contains m = 30 data (Unsaturated Fat) and Group Two n = 25 data (Saturated Fat). The data are as follows.
- Unsaturated Fat
- 112, 68, 84, 109, 153, 143, 60, 70, 98, 164, 63, 63, 77, 91, 91, 66, 70, 77, 63, 66, 66, 94, 101, 105, 108, 112, 115, 126, 161, 178.
- Saturated Fat
- 124, 58, 56, 68, 79, 89, 10786, 142, 110, 96, 142, 86, 75, 117, 98, 105, 126, 43, 46, 81, 133, 165, 170, 200, 200.
The required numerical evaluations were carried out using the Package of R software. We fitted length-biased weighted Lomax distributions with presence of outliers for k = 0, 1, 2 to this data. These three distributions were fitted into the subject data using maximum likelihood estimation. The MLEs, estimated standard errors (SE) of the parameters and the corresponding log-likelihood values (−2Log−L), Akaike information criterions (AIC; [22]), and P-values for Kolmogorov-Smirnov test (K-S P-value) are displayed in Table 3.
Table 3.
Maximum likelihood estimators (MLEs), standard errors (SE), Likelihood, Akaike information criterions (AIC) and K-S P-value values.
The model with a minimum of AIC value was chosen as the best model to fit the data. From Table 3 we conclude that the LBWLD with presence of k = 1 outliers is better than others.
6. Conclusions
This paper proposed a length-biased weighted Lomax distribution with presence of outliers. The MLES of the parameters have been investigated. The problem of estimating R = P(Y < X) parameters for length-biased weighted Lomax distributions with Presence of Outliers have been addressed. The simulation experiment was carried out for ; and , and the results were given in Table 1 and Table 2. The average bias and the MSE of the estimates based on 1000 replications and bootstrap method based on 500 resampling were reported. The results indicated that the MSEs of the estimations decrease when sample sizes increase and the average biases and MSEs of the estimations decrease when applying the bootstrap method. The usefulness of the distributions was illustrated in the analysis of real data. The results indicated that the LBWLD in presence of k = 1 outliers may be used for a wider range of statistical applications.
Author Contributions
Both authors performed experiments, contributed to the software and
the paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Lomax, K.S. Business Failures: Another Example of the Analysis of Failure Data. J. Am. Stat. Assoc. 1954, 49, 847–852. [Google Scholar] [CrossRef]
- Abd-Elfattah, A.M.; Alharbey, A.H. Estimation of Lomax Parameters Based on Generalized Probability Weighted Moment. J. King Abdulaziz Univ. Sci. 2010, 22, 171–184. [Google Scholar] [CrossRef]
- Stene, J. Probability Distributions Arising from the Ascertainment and the Analysis of Data on Human Families and Other Groups. Stat. Distrib. Sci. Work Appl. Phys. Soc. Life Sci. 1981, 6, 233–244. [Google Scholar]
- Oluyede, B.O.; George, E.O. On Stochastic Inequalities and Comparisons of Reliability Measures for Weighted Distributions. Math. Probl. Eng. 2002, 8, 1–13. [Google Scholar] [CrossRef]
- Gupta, A.K.; Tripathi, R.C. Weighted Bivariate Logarithmic Series Distributions. Commun. Stat. Theory Methods 1996, 25, 1099–1117. [Google Scholar] [CrossRef]
- Patill, G.P.; Rao, C.R.; Ratnaparkhi, M.V. On Discrete Weighted Distribution and their Use in Model Choice for Observed Data. Commun. Stat. Theory Methods 1986, 15, 907–918. [Google Scholar] [CrossRef]
- Das, K.K.; Roy, T.D. Applicability of Length Biased Weighted Generalized Rayleigh Distribution. Adv. Appl. Sci. Res. 2011, 2, 320–327. [Google Scholar]
- Das, K.K.; Roy, T.D. On Some Length-Biased Weighted Weibull Distribution. Adv. Appl. Sci. Res. 2011, 2, 465–475. [Google Scholar]
- Rao, C.R. On Discrete Distributions Arising out of Methods of Ascertainment. Indian J. Stat. Ser. A 1965, 27, 311–324. [Google Scholar]
- Halperin, M.; Gilbert, P.R.; Lachin, J.M. Distribution Free Confidence Intervals for Pr(X1 < X2). Biometrics 1987, 43, 71–80. [Google Scholar]
- Simonoff, J.S.; Hochberg, Y.; Reiser, B. Alternative Estimation Procedures for Pr(X < Y) in Categorized Data. Biometrics 1986, 42, 895–907. [Google Scholar] [PubMed]
- Reiser, B.; Farragi, D. Confidence Bounds for Pr(a’x > b’y). Statistics 1994, 25, 107–111. [Google Scholar] [CrossRef]
- Bamber, D. The Area above the Ordinal Dominance Graph and the Area below the Receiver Operating Characteristic Graphs. J. Math. Psychol. 1975, 12, 387–415. [Google Scholar] [CrossRef]
- Weerahandi, S.; Johnson, R.A. Testing Reliability in a Stress-Strength Model When X and Y are Normally Distributed. Technometrics 1992, 34, 83–91. [Google Scholar] [CrossRef]
- Nadarajah, S. Reliability of Laplace Distributions. Math. Problems Eng. 2004, 2, 169–183. [Google Scholar] [CrossRef]
- Dixit, U.J.; Nasiri, P.F. Estimation of Parameters of the Exponential Distribution in the Presence of Outliers Generated from Uniform Distribution. Available online: ftp://luna.sta.uniroma1.it/RePEc/articoli/2001-LIX-3_4-13.pdf (accessed on 15 February 2018).
- Dixit, U.J. Characterization of the Gamma Distribution in the Presence of k Outliers. Bull Bombay Math Colloq. 1987, 4, 54–59. [Google Scholar]
- Ahmad, A.; Ahmad, S.P.; Ahmed, A. Length-Biased Weighted Lomax Distribution: Statistical Properties and Application. Pak. J. Stat. Oper. 2016, 12, 1178. [Google Scholar] [CrossRef]
- Efron, B. The Jackknife, the Bootsrap and other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics; SIAM: Philadelphia, PA, USA, 1982. [Google Scholar]
- Wang, M.; Wang, W. Bias-Corrected Maximum Likelihood Estimation of the Parameters of the Weighted Lindley Distribution. Commun. Stat. Simul. Comput. 2017, 46, 530–545. [Google Scholar] [CrossRef]
- King, M.; Bailey, D.M.; Gibson, D.G.; Pitha, J.V.; McCay, P.B. Incidence and Growth of Mammary Tumors Induced by 7, 12-Dimethylbenz[a]antheacene as Related to the Dietary Content of Fat and Antioxidant. J. Nat. Cancer Inst. 1979, 63, 656–664. [Google Scholar] [CrossRef]
- Akaike, H. On entropy maximization principle. In Applications of Statistics; Krishnaiah, P.R., Ed.; Elsevier Science Ltd.: North-Holland, Amsterdam, 1977; pp. 27–41. [Google Scholar]
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).