A New Parametric Life Distribution with Modified Bagdonavičius–Nikulin Goodness-of-Fit Test for Censored Validation, Properties, Applications, and Different Estimation Methods

In this paper, we first study a new two parameter lifetime distribution. This distribution includes “monotone” and “non-monotone” hazard rate functions which are useful in lifetime data analysis and reliability. Some of its mathematical properties including explicit expressions for the ordinary and incomplete moments, generating function, Renyi entropy, δ-entropy, order statistics and probability weighted moments are derived. Non-Bayesian estimation methods such as the maximum likelihood, Cramer-Von-Mises, percentile estimation, and L-moments are used for estimating the model parameters. The importance and flexibility of the new distribution are illustrated by means of two applications to real data sets. Using the approach of the Bagdonavicius–Nikulin goodness-of-fit test for the right censored validation, we then propose and apply a modified chi-square goodness-of-fit test for the Burr X Weibull model. The modified goodness-of-fit statistics test is applied for the right censored real data set. Based on the censored maximum likelihood estimators on initial data, the modified goodness-of-fit test recovers the loss in information while the grouped data follows the chi-square distribution. The elements of the modified criteria tests are derived. A real data application is for validation under the uncensored scheme.


Introduction
Focusing on one of the most popular positive probability models, the Weibull (W) distribution, in this paper, we introduce a new generalization called the Burr X Weibull (BXW). The W model was proposed in 1951 and is widely used in reliability analysis and in several different fields with different applications, see, for example, ( [1]). Although it includes widely usage, a negative point of the distribution is the limited shape of its hazard function, which can only monotonically increase, Entropy 2020, 22, 592 2 of 24 decrease, or remain constant. Generally, practical problems require a wider range of possibilities in the medium risk, for example, when the lifetime data present a bathtub-shaped hazard function, such as human mortality and machine life cycles. Over several years, researchers have developed various extensions and modified forms of the Weibull distribution with different numbers of parameters. A state-of-the-art survey on the class of such distributions can be found in ( [1]). Some extensions of the W distribution with more than two parameters are available in the literature, such as exponentiated W (Exp-W) ( [2,3]), the additive W ( [4]), the Marshall-Olkin extended W ( [5]), the beta inverse W ( [6]), transmuted exponentiated generatized W ( [7]), Marshall-Olkin additive W ( [8]), the Topp Leone generated W distribution ( [9]), the exponentiated generalized W Poisson ( [10]), Type I general exponential W ( [11]), new four-parameter W ( [12]), Burr XII W ( [13]), Marshall-Olkin generalized W Poisson ( [14]), odd Lindley W ( [15]), Lindley W ( [16]), W generalized W ( [17]), new extended W ( [18]), Type II general exponential W ( [19]), Burr X exponentiated W ( [20]), odd power Lindley W ( [21]), odd Nadarajah-Haghighi W ( [22]), and WW Poisson ( [23]).
In this paper and after studying the mathematical properties of the BXW model, some non-Bayesian methods, such as the maximum likelihood, Cramer-Von-Mises, percentile estimation, and L-moments, is used for estimating the model parameters, are considered. For comparing non-Bayesian methods, simulation studies are provided. The importance and flexibility of the new distribution is illustrated by means of two applications of real data sets. Uncensored applications for comparing the non-Bayesian methods are presented. The censored maximum likelihood estimation is derived and, using the approach of the Bagdonavicius-Nikulin goodness-of-fit test for the right censored validation, we propose and apply a modified chi-square goodness-of-fit test for a new model. The modified goodness-of-fit statistics test was applied for the right censored real data set. Based on the censored maximum likelihood estimators on initial data, the modified goodness-of-fit test recovered the loss in information, while the grouped data followed the chi-square distribution. The elements of the modified criteria tests are derived. Finally, a real data application for validation under the uncensored scheme is presented.

The BXW Model
The cumulative distribution function (CDF) of the two parameter W distributions is given by where α > 0 scale parameter and β > 0 are shape parameters. Clearly, for α = 1, the two parameter W model reduces to one parameter W model. For β = 1, we get the standard exponential model. The CDF and the probability density function (PDF) of one parameter W model are respectively. Depending on Equation (1), we defined and study a new lifetime model called the BXW distribution. Its main characteristic is that one shape parameter is added in Equation (1) to provide more flexibility for the generated distribution. Based on the Burr X-G (BX-G) family pioneered by [24] we constructed the two-parameter BXW model and give a comprehensive description of some of its mathematical properties. The new distribution has the advantage of being capable of modeling various shapes of aging and failure criteria. Further, the BXW model is shown to fit better than at least six other competitive models, each having the same number of parameters. We aim to attract wider applications in engineering, medicine, and other areas of research. [24] defined the CDF of the BX-G family by and ξ = ξ k = (ξ 1 , ξ 2 , . . .) are parameter vectors. The CDF corresponding to Equation (2) becomes where θ is the shape parameter. A random variable (RV) X with PDF (3) is denoted by X ∼ BXG(θ, ξ). By substituting Equation (1) with Equation (2), the two-parameter BXW CDF of X is given by (for x > 0): The PDF corresponding to Equation (4) is given by where θ and β are the shape parameters representing the different shapes of the BXW distribution. We denote a RV X with PDF (5) by X ∼ BXW(θ, β). The reliability function (RF), hazard rate function (HRF), and cumulative hazard rate function (CHRF) of X are given by respectively. The PDF and HRF plots of the BXW distribution for some parameters are given below. Hereafter, we provide a very useful linear representation for the BXW density function. First, we consider the two power series: where u 1 u 2 < 1 and b > 0. By using the above power series, and after performing some algebra, the PDF (5) can be expressed as: where Equation (7) reveals that the PDF of X can be expressed as a linear mixture of Exp-W densities. Therefore, several mathematical properties of the new family can be obtained by knowing those of the Exp-W distribution. Similarly, the CDF of the BXW distribution can also be expressed as a mixture of Exp-W, CDF given by: where is the CDF of the Exp-W distribution with power parameter k + 2( j + 1). According to [25], the BXW distribution can be expressed through the following functional composition: where G θ 2 is G 2 to the power θ and Q L refers to the quantile function of a loglogistic model with parameters equal to 1, namely, the odds function: Reference [25] also studied the function Q L (F), where F is any given CDF. In particular, Q L G β (x) is a convex function; therefore, stochastic ordering properties of the BXW model can be derived straightforwardly through Theorem 1 of [25].
Setting r = 1 in Equation (9), we have the mean of X. The last integration can be computed numerically for most parent distributions. The sth incomplete moment, say ( ), of can be expressed using Equation (7), for > − , as The mean deviations about the mean δ 1 = (| − 1 ′ |) and about the median δ 2 = (| − |)

Some Moments
The rth ordinary moment of X is given by: We then obtain (for any r > −β) Setting r = 1 in Equation (9), we have the mean of X. The last integration can be computed numerically for most parent distributions. The sth incomplete moment, say u s (t), of X can be expressed using Equation (7), for s > −β, as The mean deviations about the mean δ 1 = E X − µ 1 and about the median δ 2 = E(|X − M|) of X are given by δ 1 = 2µ 1 F µ 1 − 2u 1 µ 1 and δ 2 = µ 1 − 2u 1 (M), respectively, where µ 1 = E(X), M is the median of X, F µ 1 is easily calculated from Equation (4), and u 1 (t) is the first incomplete moment given by (10) with s = 1, as The main applications of the first incomplete moment refer to the mean deviations and the Bon-ferroni and Lorenz curves. These curves are very useful in economics, reliability, demography, insurance, and medicine. The Lorenz, say L F and Bonferroni, say B[F(x)], curves are defined by respectively. Here, we derive L F (x) and B[F(x)] curves for the BXW distribution as follows:

Probability Weighted Moments (PWMs)
The (s,r)th probability weighted moment (PWM) of X, following the BXW distribution, say ρ s,r , is formally defined by: Using Equations (4) and (5), we can write Then, the (s,r)th PWM of X can be expressed (for s > −β) as

Order Statistics
Let X 1 , X 2 , . . . , X n be a random sample from the BXW distribution and let X (1) , X (2) , . . . , X (n) be the corresponding order statistics. The PDF of the ith order statistic, say X i:n , can be written as where B(·, ·) is the beta function. Substituting Equations (4) and (5) in Equation (11) and using a power series expansion, we have: The PDF of X i:n can be expressed as: The density function of the BXW order statistics is a mixture of Exp-W PDF. Based on the last equation, we note that the properties of X i:n follow from those properties of Y 2w+k+2 . For example, the moments of X i:n can be given (for q > −β) by:

Renyi and δ −Entropies
The Renyi entropy of a RV X represents a measure of variation of the uncertainty. The Renyi entropy is defined by: Using the PDF (5), the last equation the Renyi entropy of X is given by The δ-entropy, say H δ (X), can be obtained (for, δ > 0, δ 1) as:

Classical Parameter Estimation
Several approaches for parameter estimation were proposed in the literature. In this article, we will consider the following methods:

I.
The maximum likelihood method; II. Method of Cramer-Von-Mises estimation; III. Method of percentile estimation; IV. Method of L-moments.

The Maximum Likelihood Method
Let x 1 , x 2 , . . . , x n be a random sample from the BXW distribution with parameters θ and β. Let φ = (θ, β) T be the 2 × 1 parameter vector. For determining the MLE of φ, we have the log-likelihood function: Setting the nonlinear system of equations U(θ) = 0 and U(β) = 0 and solving them simultaneously To solve these equations, it is usually more convenient to use nonlinear optimization methods, such as the quasi-Newton algorithm to numerically maximize l(φ).

Cramer-Von-Mises Estimation Method
The Cramer-Von-Mises estimation method of the parameters is based on the theory of minimum distance estimation (see [26]). The Crammer-Von-Mises estimates (CVMEs) of the parameter θ and β are obtained by minimizing the following expression, with respect to (w.r.t.), the parameters θ and β, respectively: The Cramer-Von-Mises estimates (CVMEs) of the parameters are obtained by solving the following non-linear equations: are the values of the first derivatives of the CDF of BXW distribution w.r.t. θ, β, respectively.

Method of Percentile Estimation
Let γ (i) = i 1+n be an estimate of F (φ) x (i) , then the percentile estimators (PerEs) of θ and β can be obtained by minimizing the function with respect to θ and β. Additionally, the PerEs can be obtained by solving the following nonlinear equations:

Method of L-Moments
The L-moments are analogous to the ordinary moments but can be estimated by linear combinations of order statistics. They exist whenever the mean of the distribution exists, even though some higher moments may not exist and are relatively robust to the effects of outliers. Based upon the moments of the order statistics, we can derive explicit expressions for the L-moments of X as infinite weighted linear combinations of the means of suitable BXW order statistics. The L-moments for the population can be obtained from: Entropy 2020, 22, 592 10 of 24 The first four L-moments are given by is the L-moments for the sample. The L-moment estimators of the parameters θ and β can then be obtained numerically.

Numerical Assessment
In this section, we study the performance and accuracy of maximum likelihood estimates of the BXW model parameters by conducting various simulations for different sample sizes and different parameter values. The method for generating samples from the BXW distribution is performed by inverse CDF of BXW and uniform RV, as follows: If where u ∼ U(0, 1), then X ∼ BXW(φ). The simulation study is repeated for N = 5000 times, each with sample size n = 100, 200, and 500 and parameter values (θ, β) = (0.4, 2.5), (3, 0.02), (0.6, 0.6) and (0.19, 2.5). Two quantities are computed in this simulation study: 1. Average bias of the MLEε of the parameter ε = θ, β: 2. Mean square error (MSE) of the MLEε of the parameter ε = θ, β: Table 1 presents the Bias and mean square error (MSE) values of the parameters θ and β for different sample sizes. From the results, we can verify that, as the sample size n increases, the MSEs decay toward zero. We also observe that for all the parametric values, the biases decrease as the sample size n increases.

Graphical Assessment
Graphically, we could perform the simulation experiments to assess the finite sample behavior of the MLEs. The assessment was based on the following algorithm:

1.
Use Equation (13) to generate 1000 samples of size n from the BXW distribution; 2.
Compute the MLEs for the 1000 samples; 3.
Compute the standard errors (SEs) of the MLEs for the 1000 samples (the standard errors (SEs) were computed by inverting the observed information matrix).

4.
Compute the biases and mean square errors given for θ, β.
We repeated these steps for n = 50, 100, . . . , 1000 with θ = 1, β = 1 computing biases and MSEs. Figure 2 (left panel) shows how the two biases vary, with respect to n. Figure 2 (right panel) shows how the two MSEs vary, with respect to n. The broken lines in Figure 2 correspond to the biases at 0. From Figure 2, the biases for each parameter are generally negative and decrease to zero as n→∞, the MSEs for each parameter decrease to zero as n→∞.
We repeated these steps for n = 50, 100, …, 1000 with = 1, = 1 computing biases and MSEs. Figure 2 (left panel) shows how the two biases vary, with respect to n. Figure 2 (right panel) shows how the two MSEs vary, with respect to n. The broken lines in Figure 2 correspond to the biases at 0. From Figure 2, the biases for each parameter are generally negative and decrease to zero as n→∞, the MSEs for each parameter decrease to zero as n→∞.

Simulation Studies for Comparing Non-Bayesian Estimation Methods
A Markov chain Monte-Carlo (MCMC) simulation study was performed for this section to assess and compare the performance of the different estimators of the unknown parameters of the new distribution.
This performance was assessed using the average values (AVs) of estimates and the mean square errors (MSEs).
First, we generated 1000 samples of the BXW distribution, where n = (20, 50, 150, 300), and chose: The AVs and MSEs of MLEs, PerEs, and L-Moments were obtained and listed in Tables 2-5. From Tables 2-5, we noted that all methods performed well.

Non-Bayesian Uncensored Applications for Comparing Models
We provided two applications with two real data sets to prove the flexibility of the BXW distribution. We fitted some of the well-known two parameter lifetime distributions, such as the Weibull, gamma, generalized exponential (GE) ( [27]), exponential geometric (EG) ( [28]), exponential Poisson (EP) ( [29]), and complementary exponential geometric (CEG) ( [30]) distributions, into two real data sets.
The first data set was given by [31] on the prices of the 31 different children's wooden toys on sale in a Suffolk craft shop in April 1991. The data set consisted of 31 observations. The second data set was an uncensored data set from [32], consisting of 34 observations on vinyl chloride data obtained from clean upgradient monitoring wells in mg/L. These data were also analyzed by [33]. The MLE of parameters, such as the maximized log-likelihood function, Akaike information criterion (AIC), Bayesian information criterion (BIC), Hannan-Quinn information criterion (HQIC), consistent Akaike information criterion (CAIC) ( [30]), Anderson-Darling (A * ), and Cramer-von-Mises (W * ) statistics, were determined by fitting the two parameter distributions using the two data sets. The statistics A * and W * were described by [34]. In general, the smaller values of these statistics showed the better fit to the data sets. The MLEs were computed using the limited-memory quasi-Newton code for bound-constrained optimization (L-BFGS-B). The estimated parameters, shown in Tables 6 and 7, were based on MLE procedure reports, whereas the values of goodness-of-fit statistics are given in Tables 8 and 9. In the applications, the information about the hazard shape helped in selecting a particular model. For this aim, a device called the total time on test (TTT) plot ( [35]) was useful.  It is convex shape for decreasing hazards and a concave shape for increasing hazards. The TTT plot for both data sets are presented in Figure 3. These figures indicate that the first and second datasets have bathtub and constant failure rate functions. In both real data sets, the results show that the BXW distribution yields a better fit than other distributions. These conclusions are also confirmed by Figures 4-6.         . Fitted CDFs on the empirical CDF of the second data set. Figure 6. Fitted CDFs on the empirical CDF of the second data set. Tables 10 and 11 give the values of estimators, along with the W * and A * statistics for all methods. From Table 10, we conclude that the PerEs method was the best method for the first data; however, all other methods performed well. From Table 11. we conclude that the L-moment method was the best method for the second data; however, all other methods performed well.

Censored Maximum Likelihood Estimation
Suppose that X 1 , X 2 , . . . . . . , X n is a random sample with right censoring from the BXW distribution. The observed data x i = min(X i , C i ); i = 1, 2, . . . , n are the minimum of the survival time X i and censoring time C i for each subject in the sample. Therefore, x i can be written in the form (x i , δ i ) i=1,...,n , where δ i = 1 if X i is the moment of failure (complete observation) and δ i = 0 if X i is the moment of censoring. The right censoring was assumed to be non-informative, so the expression of the likelihood function is: where R φ (x i ) refers to the RF. The log-likelihood function of the BXW distribution is: and the score functions are obtained as follows: Maximum likelihood estimators of the unknown parameters can be obtained using various techniques, such as software R, the Expectation-maximization (EM) algorithm, or the Newton-Raphson method.

Modified Chi-Squared Type Test for Right Censored Data
Methods for testing the validity of parametric models are in constant development, but the presence of censorship make them unavailable. [36,37] proposed a modified chi-squared test based on Kaplan-Meyer estimators. [38] considered modifications of the Kolmogorov-Smirnov statistic, Anderson-Darling statistic, and the Cramer-Von-Mises statistic for accelerate failure models. In this work, we are interested in the modified chi-squared type test, proposed by [39][40][41] for parametric models with right censored data. Based on maximum likelihood estimators on non-grouped data, this statistic test is also based on the differences between the numbers of observed failures and the numbers of expected failures in the grouped intervals chosen. For this, random grouping intervals are considered as data functions. The description of the construction of this chi-squared type test was developed by [42]. The statistic test was defined as follows. Suppose that X 1 , X 2 , . . . , X n is a random sample with right censoring from a parametric model and a finite time τ. The statistic test is defined as follows: where U j and e j are the observed and expected numbers of failure in grouping intervals and Q is: The limits a j of r random gouging intervals I j = a j−1 , a j were chosen, such as the expected failure times to fall into these intervals, which were the same for each; j = 1, . . . , r − 1,â r = max x (l) , τ .
The estimatedâ j is defined bŷ where Hφ(x l ) is the cumulative HRF (CHRF) of the model distribution. This statistic test Y 2 n follows a chi-squared distribution.

Choice of Random Grouping Intervals
Suppose that X 1 , X 2 , . . . . . . , X n is a random sample with right censoring from the BXW distribution and a finite time τ. In our case, the estimatedâ j is obtained as follows: T is the maximum likelihood estimator of the unknown parameters ϕ = (θ, β) T on initial data and Hφ(x l ) is the cumulative hazard rate function of the BXW distribution.

Quadratic Form Q
To calculate the quadratic form Q of the statistic Y 2 n , and, as its distribution doesn't depend on the parameters, we can use the estimated matricesŴ andĈ and the estimated information matrixÎ. are obtained as belowĈ Therefore, the estimated matrixŴ can be deducted fromĈ.

Estimated Information MatrixÎ
We need also the information matrixÎ of the BXW distribution with right censoring. After difficult calculations and some simplifications, we obtained the elements of the matrix, as follows: As all the components of the statistic were given explicitly, we then obtained the statistic test for the BXW distribution with unknown parameters and right censored data. This statistic follows a chi-squared distribution with r degrees of freedom:

Simulations
An important simulation study is carried out to show the performance of the techniques used and the feasibility of the goodness-of-fit test developed in this work. To this end, we generated N = 10, 000 right censored samples with different sizes, n 1 = 20 and n 2 = 50, n 3 = 150, n 4 = 300, from the BXW model with different parameters. Firstly, we computed the MLEs of the unknown parameters, then the criteria Y 2 of the corresponding samples were provided.

Censored Maximum Likelihood Estimation for BXW
Using R statistical software and the Barzilai-Borwein (BB) algorithm (see [43]), we calculated the maximum likelihood estimators of the unknown parameters, the corresponding bias, and mean square errors (MSEs). The results are given in Table 12. For testing the null hypothesis H 0 , that right censored data come from the BXW model, we computed the criteria statistic Y 2 n (φ), as defined above, for 10, 000 simulated samples from the hypothesized distribution with different sizes (n = 20, 50, 150, 300). We then calculated empirical levels of significance, when Y2 > χ 2 ε (r), correponding to theoretical levels of significance (ε = 0.10, 0.05, 0.01). We chose r = 5. The results are reported in Table 13. The null hypothesis H 0 , for which simulated samples were fitted by BXW distribution, is widely validated for the different levels of significance. Therefore, the test proposed in this work can be used to fit data from this new distribution. [44] gave data from a laboratory investigation and the number of T days until onset of carcinoma was recorded. The data below concerns a group of 19 rats (Group 1 in Pike's article). The two observations with asterisks are censorship times, where the data are: 143,164,188,188,190,192,206,209,213,216, 216 *, 220, 227, 230, 234, 244 *, 246, 265, and 304 (* indicates the censorship). We used the statistics test provided above to verify if these data were modeled by BXW distribution, and, to that end, we first calculated the maximum likelihood estimators of the unknown parameters: Data were grouped into r = 4 intervals, I j . We give the necessary calculus in Table 14. We then obtained the value of the statistic test Y 2 n : Y 2 n = X 2 + Q = 4.106 + 3.5396 = 7.6456

Data Analysis
For significance level ε = 0.05, the critical value χ 2 4 = 9, 4877 was superior than the value of Y 2 n = 7.6456, so we can say that the proposed BXW model fit these data.

Conclusions
In this paper, we proposed a new two-parameter Weibull (W) lifetime model based on the Burr X-G (BX-G) class, called the Burr X Weibull (BXW) distribution, which extends the well-known W model. An obvious reason for generalizing a standard distribution is the fact that the generated model can provide more flexibility to analyze real-life data. We provided some of its mathematical and statistical properties. The BXW density function can be expressed as a linear mixture of exponentiated Weibull PDFs. It is shown, from the plots of the PDF and HRF of the BXW model, that this distribution is very flexible, accommodating a large number of shapes in the hazard function, such as "increasing", "decreasing", "upside down", and "bathtub" (U-HRF). We derived explicit expressions for the ordinary and incomplete moments, quantile and generating functions, and moments of the residual life and reversed residual life model. We also obtained the PDFs of the order statistics and their moments. We discussed the estimation of the model parameters by maximum likelihood, along with numericak and graphical assessemnt via simulation studies. The model parameter was estimated by different methods of estimation, named the maximum likelihood method, the method of Cramer-Von-Mises estimation, the method of percentile estimation. And the method of L-moments. MCMC simulations and two applications were performed to compare the estimation methods. The proposed distribution was applied to two real data sets, which provides a better fit than several other competitive nested and non-nested models. We hope that the proposed model will attract wider application in areas such as engineering, survival, and lifetime data, meteorology, hydrology, economics, and others. Using the approach of the Bagdonavicius-Nikulin goodness-of-fit test for the right censored validation, we propose and apply a modified chi-square goodness-of-fit test for the BXW model. The modified goodness-of-fit statistic test is applied for a right censored real data set. Based on the censored maximum likelihood estimators on initial data, the modified goodness-of-fit test recovers the loss of information, while the grouped data follows the chi-square distribution. The elements of the modified criteria tests are derived. A real data application related to the laboratory investigation is for validation under the uncensored scheme.