A New Log-Logistic Lifetime Model with Mathematical Properties, Copula, Modified Goodness-of-Fit Test for Validation and Real Data Modeling

After defining a new log-logistic model and studying its properties, some new bivariate type versions using “Farlie-Gumbel-Morgenstern Copula”, “modified Farlie-Gumbel-Morgenstern Copula”, “Clayton Copula”, and “Renyi’s entropy Copula” are derived. Then, using the Bagdonavicius-Nikulin goodness-of-fit (BN-GOF) test for validation, we proposed a goodness-of-fit test for a new log-logistic model. The modified test is applied for the “right censored” real dataset of survival times. All elements of the modified test are explicitly derived and given. Three real data applications are presented for measuring the flexibility and the importance of the new model under the uncensored scheme. Two other real datasets are analyzed for censored validation.


Introduction
A new univariate version of the log-logistic (LL) model called the Rayleigh generalized log-logistic (RG-LL) distribution (see Equations (3) and (4) and their corresponding details) is introduced, studied, and checked in modeling censored and uncensored real datasets. Following the mathematical approach to the development of the RG-LL distribution, some new bivariate RG-LL type distributions using Farlie-Gumbel-Morgenstern (FGM) Copula ( [1][2][3][4]), modified Farlie-Gumbel-Morgenstern Copula [5], Clayton Copula, and Renyi's entropy Copula [6] are derived in this paper (see Section 3). Two major reasons as to why copulas are of interest to statisticians ( [7]): "Firstly, as a way of studying scale-free measures of dependence; and secondly, as a starting point for constructing families of bivariate distributions." Specifically, copulas are an important part of the study of dependence between two variables since they allow us to separate the effect of dependence from the effect of the marginal distributions. Further future articles could be allocated to study the new bivariate RG-LL type.
The new model is used for modeling three real datasets. The 1st data are an "engineering real-life data" and consist of 100 observations of "breaking stress of carbon fibers" given by [8]. The 2nd data are a "reliability survival data" and called the "survival times", in days, of 72 guinea pigs infected with virulent tubercle bacilli, this data were originally observed and reported by [9]. The 3rd data are medical data and called leukemia data. This real dataset gives the survival times, in weeks, of 33 patients suffering from acute myelogenous leukemia. These applications are used to illustrate the importance, potentiality, and flexibility of the RG-LL model. The hazard rate function (HRF) of the 1st and 2nd real datasets is "monotonically increasing". However, the HRF of the 3rd is "decreasing-constant-increasing". These read datasets are analyzed by [10][11][12][13][14][15][16][17].
The RG-LL model has only three parameters. However, its competitive models have at least three parameters (or more) as illustrated in Section 5 (Tables A3-A5). It is worth mentioning that, the lifetime model with a smaller number of parameters is a favorable one especially when if it gives a better (or same) fit. The RG-LL model has the lowest (best) value of the used criteria (see Tables A3-A5). Therefore, it is recommended to apply the RG-LL model in modeling instead of all the other competitive models. For the applied purposes, especially in the mathematical modeling, the RG-LL model could be useful in the following applied cases: 1.
Modeling the "asymmetric monotonically right skewed" heavy tail data sets (see second and third applications).

2.
Modeling the "asymmetric monotonically right skewed" heavy tail data sets for the first time ever (see [18]). 3.
In the engineering field, the RG-LL distribution can be applied for modeling the "breaking stress data" which have "monotonically increasing" HRF. As shown in Table A3, the RG-LL model proved its superiority against many competitive models. 4.
In "survival analysis", the RG-LL distribution could be chosen for modeling the "survival times data" which have a "monotonically increasing" HRF as illustrated in Table A4. 5.
In the medical field, the RG-LL distribution could be considered in modeling the "leukemia data" which have "decreasing-constant-increasing" HRF (see Table A5).
For these reasons, we are motivated to introduce and study the RG-LL distribution. For simulation purposes, the algorithm of "Barzilai-Borwein" (BB) (see [19]) is used via a simulation study for assessing the performance of the estimators with different sample sizes as the sample size tends to ∞ (for more details, see [20][21][22]). For validation purposes and using the BN-GOF test under the right censored data, we propose a modified chi-square GOF test for the RG-LL model. Based on the maximum likelihood estimators (MLEs) on initial data, the modified BN-GOF test recovers the loss in information while grouping data and follow chi-square distributions. All elements of the modified BN-GOF criteria tests are explicitly derived and given (for more details see [20,23,24]).
Generally, the LL distribution is a continuous model for a non-negative random variables (RVs). It is used in survival analysis as a parametric model for events whose rate increases initially and decreases later such as mortality rate from cancer following diagnosis or treatment (for more details see [25][26][27][28][29][30][31]). The LL model has also been used in hydrology in modeling stream flow and precipitation. In economics, the LL is employed as a simple distribution of the distribution of wealth or income. A RV Z is said to have the one parameter LL distribution if its cumulative distribution functions (CDF) can be written as: here a 2 > 0 refers to the shape parameter. A scale and location parameter can be introduced in many ways to make (1) a three-parameter distribution. It is worth mentioning that the model in (1) is a member of the Pareto Type I distribution. The corresponding probability density function (PDF) of (1) is given by The PDF in (2) is a special member from the well-known Burr type XII (BXII) model (see [25][26][27][28][29][30][31]). Based on the family of [15] and using (1), the CDF of the Rayleigh generalized LL (RG-LL) is defined by the corresponding PDF to (3) is given as For a 1 = 1, the RG-LL model reduces to the two parameter RG-LL model. For θ = 1, the RG-LL model reduces to the R-LL model (with three parameters). For θ = a 1 = 1, the RG-LL model reduces to the R-LL model (with two parameters). The HRF can be derived from h θ,a 1 ,a 2 (z) = f θ,a 1 ,a 2 (z) 1−F θ,a 1 ,a 2 (z) . Let B = in f z|G a 2 (z) >0 , the asymptotics of the CDF, PDF, and HRF as z → B are given by The asymptotics of CDF, PDF and HRF as z → ∞ are derived by Figure 1 gives some plots of PDF and HRF for the RG-LL model. From Figure 1 (left panel), we conclude that the proposed PDF of the RG-LL model can be "uniform", "unimodal", "symmetric" or "asymmetric left skewed" (or asymmetric right skewed (see Table A1)). From Figure 1 (right panel), the HRF can be "asymmetric monotonically increasing" (θ = 0.95, a 1 = 0.01, a 2 = 1.5) or "decreasing-constant" (θ = 1.5, a 1 = 0.01, a 2 = 0.35) or "J shaped" (θ = 1, a 1 = 1, a 2 = 20) or "constant" (θ = 1.5, a 1 = 0.1, a 2 = 0.45). The rest of the paper is organized as follows. In Section 2, some mathematical properties of the new model are derived. In Section 3, some new bivariate type versions using "Farlie-Gumbel-Morgenstern Copula", "modified Farlie-Gumbel-Morgenstern Copula", "Clayton Copula" and "Renyi's entropy Copula" are obtained. In Section 4, we provided three applications to real data to illustrate the flexibility of the new model. The modified BN-GOF test is presented and applied in Section 5. Simulation experiments under censorship for assessing the new test are performed in Section 6. Censored validation under real data is considered in Section 7. Finally, some concluding remarks are addressed in Section 8.

Moments and Generating Function
The PDF of the RG-LL model in (4) can be expressed as:

Expanding
, , ( ) using the power series, we get:  The rest of the paper is organized as follows. In Section 2, some mathematical properties of the new model are derived. In Section 3, some new bivariate type versions using "Farlie-Gumbel-Morgenstern Copula", "modified Farlie-Gumbel-Morgenstern Copula", "Clayton Copula" and "Renyi's entropy Copula" are obtained. In Section 4, we provided three applications to real data to illustrate the flexibility of the new model. The modified BN-GOF test is presented and applied in Section 5. Simulation experiments under censorship for assessing the new test are performed in Section 6. Censored validation under real data is considered in Section 7. Finally, some concluding remarks are addressed in Section 8.

Moments and Generating Function
The PDF of the RG-LL model in (4) can be expressed as: Expanding A a 1 ,a 2, θ (z) using the power series, we get: Applying the generalized binomial expansion to the quantity B a 2, θ,2i 1 (z), we have . Again, applying the generalized binomial expansion to the quantity C a 2, θ(i 2 −2i 1 ) (z), we arrive at i 3 is the CDF of the exponentiated LL (Exp-LL) model. By differentiating the last equation, we get where is the LL PDF with parameters a 2 and (1 + i 4 ). Similarly, the CDF (2) of RG-LL can be re-expressed as dt, the p th ordinary moment of Z can be expressed as (see [10,11,16,[32][33][34]): By setting p = 1 in (7), we get the mean of Z. Similarly, in terms of incomplete beta function of the second type, B(q; a 1 , a 2 ), where B(q; a 1 , a 2 ) = q 0 t a 1 −1 (1 + t) −(a 1 +a 2 ) dt, the p th incomplete moment of Z can be written as: The moment generating function (MGF) M z (t) = E(exp(tz)) of z can be derived from (5) as

Probability Weighted Moments (PWMs)
The (p, q) th PWM of Z following the RG-LL model, say m p,q , is formally defined by m p,q = E z p F(z) q .
The (p, q) th PWM of Z can be expressed as: and (a 1 ) a 2 = a 1 (a 1 − 1) . . . (1 + a 1 − a 2 ) is the "descending factorial" and a 2 is a positive integer.

Moment of the Reversed Residual Life
The p th moment of the reversed residual life, say A p (t) = E (t − z) p | (z≤t, t>0,p=1,2,...) . Then, we have . Then, the p th moment of the reversed residual life of Z becomes

Numerical Analysis for Skewness and Kurtosis
The effects of the three parameters for the RG-LL model on the mean (µ 1 ), variance (Var(Z)), skewness (Ske(Z)), and kurtosis (Kur(Z)) are listed in Table A1 (see the Appendix A). The effects of the parameter a 2 for the standard LL model on the µ 1 , Var(Z), Ske(Z), and Kur(Z) are listed in Table A2 (see the Appendix A). From Tables A1 and A2 we note that, the new additional shape parameters θ and a 1 have an effect on

Via Clayton Copula
The Bivariate Extension The bivariate extension via Clayton Copula can be considered as a weighted version of the Clayton the associated CDF of the bivariate RG-LL type distribution will be: The "m-dimensional extension" can be written as:

Bivariate RG-LL Type via Modified FGM Copula
Following [5], the (joint CDF) J-CDF of the bivariate modified FGM copula can be expressed as Here ϑ(u) and ϕ(w) are two absolutely continuous functions on (0, 1) with the following conditions: (I)-The "boundary" condition:

Bivariate RG-LL-FGM (Type-I) Model
Here, we consider the following functional form for both ϑ(u) and ϕ(w) as

Bivariate RG-LL-FGM (Type-II) Model
Consider the following functional form for both ϑ(u) and ϕ(w) which satisfy all the conditions stated earlier where ϑ(u) The corresponding bivariate RG-LL-FGM (Type-II) copula can be derived from:
In this case, one can also derive a closed form expression for the associated CDF of the bivariate RG-LL-FGM (Type-III).

Bivariate RG-LL-FGM (Type IV) Model
The J-CDF of the bivariate RG-LL-FGM (Type-IV) model can be derived from

Bivariate RG-LL Type Using Renyi's Entropy Copula
Due to [6], The J-CDF of the Renyi's entropy Copula can be expressed as C(u, w) = z 2 u + z 1 w − z 1 z 2 , then, the associated bivariate MOLBX will be Many useful details and other similar work can be found in [35][36][37][38][39].

Uncensored Real Data Applications
The log-likelihood function ( n (ξ)) for ξ is given by The above n (ξ) can be maximized numerically via "SAS (PROC NLMIXED)" or "R (optim)" or "Ox program (via sub-routine MaxBFGS)", among others. The components of the score vector can be derived easily. We provide three real applications to illustrate the importance, potentiality and flexibility of the RG-LL model. For these data, we compare the RG-LL distribution, with BXII, Topp-Leone-BXII (TLBXII), Zografos-Balakrishnan-BXII (ZBBXII), Marshall-Olkin-BXII (MOBXII), Five Parameters beta-BXII (FBBXII), Beta-BXII (BBXII), Beta exponentiated-BXII (BEBXII), Five Parameters Kumaraswamy-BXII (FKwBXII), and the KwBXII distributions given in [11][12][13][14][15].  [40]) is an important graphical approach to verify whether the data can be applied to a specific distribution or not. The TTT plots of the three real datasets are presented in Figure 2. This plot indicates that the empirical HRFs of the 1st and 2nd datasets are increasing. The empirical HRF is the bathtub for the 3rd dataset.
We consider the following goodness-of-fit statistics: The "Akaike information criterion" (C AI ), "Bayesian information criterion" (C Bayes ), "consistent Akaike information criterion" (C CA ), and "Hannan-Quinn information criterion" (C HQ ). Tables A3-A5 (see the Appendix A) give the MLEs, standard errors (SEs), and confidence interval (CIs) for all datasets. The same tables give the statistics C AI , C Bayes , C HQ , and C CA values for these datasets.
Based on the values in Table A3, we conclude that the RG-LL model provides the best fit as compared to other competitive models in the three applications with the smallest values of C AI = 302.75, C Bayes = 301.17, C HQ = 299.18, and C CA = 300.92. Based on the values in Table A4, it is noted that the RG-LL model provides the best fit as compared to other competitive models in the three applications with the smallest values of C AI = 208.01, C Bayes = 211.11, C HQ = 207.12, and C CA = 209.53. Based on the values in Table A5, it is noted that the RG-LL model provides the best fit as compared to other competitive models in the three applications with the smallest values of C AI = 313.44, C Bayes = 316.41, C HQ = 313.11, and C CA = 313.02. Figures 2-5 give the total time in test (TTT) plots, the estimated CDFs plots, the estimated PDFs plots, and the estimated HRFs plots, respectively. Based on Figure 2, the HRF of the three data are "monotonically increasing", "monotonically increasing", and "decreasing-constant-increasing", respectively. Based on the values in Table A3, we conclude that the RG-LL model provides the best fit as compared to other competitive models in the three applications with the smallest values of AI = 302.75, Bayes = 301.17, HQ = 299.18, and CA = 300.92. Based on the values in Table A4, it is noted that the RG-LL model provides the best fit as compared to other competitive models in the three applications with the smallest values of AI = 208.01 , Bayes = 211.11 , HQ = 207.12, and CA = 209.53. Based on the values in Table A5, it is noted that the RG-LL model provides the best fit as compared to other competitive models in the three applications with the smallest values of AI = 313.44, Bayes = 316.41, HQ = 313.11, and CA = 313.02. Figures 2-5 give the total time in test (TTT) plots, the estimated CDFs plots, the estimated PDFs plots, and the estimated HRFs plots, respectively. Based on Figure 2, the HRF of the three data are "monotonically increasing", "monotonically increasing", and "decreasing-constant-increasing", respectively.        Based on the values in Tables A3-A5 and Figures 2-5, we conclude that the RG-LL model provides good (and also the best) fits as compared to other competitive models in the three     Tables A3-A5 and Figures 2-5, we conclude that the RG-LL model provides good (and also the best) fits as compared to other competitive models in the three applications with smallest values of C AI , C Bayes , C HQ and C CA .

Censored Maximum Likelihood
Suppose that Z 1 , Z 2 , . . . .., Z n is a RS with right censoring from the RG-LL ξ distribution.
The observed data z i | (i=1,2,..,n) = min(Z i , C i ) are the "minimum of the survival time" Z i and censoring time C i for each subject in the sample. Therefore, z i can be written as (z i , ∇ i ) i=1,2,...,n where ∇ i = 1 if Z i is the moment of failure (complete observation) and ∇ i = 0 if Z i is the "moment of censoring". The likelihood function can be written as: The log-likelihood function of RG-LL ξ distribution is: where i = 1 + z a 2 i , and the score functions are obtained as follows: The MLEs of the unknown parameters can be obtained using various techniques, either software R, "EM algorithm", or "Newton Raphson" method.

The Modified BN-GOF Test for Right Censored Data
Based on [23,24], the statistic test is defined as: where U j and e j are the observed and the expected numbers of failure in grouping intervals, other elements were defined in [20,23,24]. The endpoints ρ j of k random gouging intervals I j = ρ j−1 , ρ j [ are chosen so that the expected failure times to fall into these intervals are the same for each j = 1, .., k − 1, ρ k = max z (l) , τ . The estimatedρ j is defined bŷ andρ k = max z (n), τ where H θ,a 1 ,a 2 (z l ) is the cumulative HRF (CHRF) of the model distribution. This statistic test Y 2 n follows a chi-squared model.

Choice of Random Grouping Intervals
Suppose that Z 1 , Z 2 , . . . .., Z n is a RS with right censoring from the RG-LL ξ model and a finite time τ. The estimatedρ j is obtained as follows: where H θ,a 1 ,a 2 (z l ) is the CHRF of the RG-LL ξ distribution.

Quadratic Form of Q of the Statistic Y 2 n
To calculate the quadratic form Q of the statistic Y 2 n , and as its distribution does not depend on the parameters, so we can use the estimated matricesŴ,Ĉ and the estimated information matrixÎ The elements ofĈ are defined in [20].

Estimated Information MatrixÎ
We need also the information matrixÎ of the RG-LL ξ model with the right censoring. After difficult calculations and some simplifications, we have obtained the elements of the matrix as follows:î Then, we obtain the statistic test for the RG-LL ξ distribution with the "right censored" data. This statistic follows a chi-squared distribution with k degrees of freedom.

Simulations under Censorship
In this section we perform a simulation study to consolidate our results. For this purpose, N = 10, 000 censored samples (with sizes: n = 25, 50, 130, 350, 500, 1000) from the RG-LL ξ distribution is simulated.

Maximum Likelihood Estimation
We generate the simulated samples with various parameters. Using the R software and BB algorithm, means simulated MLEs and their mean squared errors (MSEs) are calculated and given in Table 1. As shown in these results, the MLEs are convergent. For testing the null hypothesis H 0 that the "right censored" data become from the RG-LL model, we computed the criteria statistic Y 2 n ξ as defined above for N = 10, 000 simulated samples from the hypothesized distribution with different sizes (n = 25, 50, 130, 350, 500, 1000). Then, we calculated empirical levels of significance, when Y 2 > χ 2 ε (r), corresponding to theoretical levels of significance (ε = 0.10, 0.05, 0.01), we choose k = 5. The results are reported in Table 2. Based on Table 2, the test proposed in this work, can be used to fit data from this new model.

Censored Validation under Real Data
Example 1. Reference [41] has reported survival data on 26 psychiatric inpatients admitted to the university of Iowa hospitals during the years 1935-1948. This sample is part of a larger study of psychiatric inpatients discussed by [42]. Data for each patient consists of age at rest admission to the hospital, sex, number of years of follow-up (years from admission to death or censoring), and patient status at the follow-up time. The data is given as: Data are grouped into k = 5 intervals I j . We give the necessary calculus in the Table 3. Then we obtain the value of the statistic test Y 2 n : For significance level = 0.05, the critical value χ 2 5 = 11.0705 is superior than the value of Y 2 n = 7.9356 (see Table 4), so we can say that the proposed model RG-LL fits these data. We calculated also the test statistics Y 2 n to fit these data to the competing models. Then, we grouped the observations into r = 5 intervals I j . The intermediate calculations are given in Table 5. The value of the statistic test Y 2 n is obtained as follows: Based on Table 5, the value of Y 2 n = 8.9023 is less than the critical value χ 2 5 = 11.0705-(for significance level ε = 0.05), so we can say that these data can be fitted by the RG-LL model. Many useful uncensored real-life data sets in life testing, economies, medicine and engineering can be found in [44][45][46][47][48][49][50][51][52].

Conclusions
In this paper, a new three-parameter version of the log logistic model is introduced and studied. Some of its mathematical properties are derived. The new hazard rate function can be "asymmetric monotonically increasing", "decreasing-constant", "J shaped", or "constant". A simple type copula is considered for deriving many bivariate and multivariate extensions using "Farlie-Gumbel-Morgenstern Copula", "modified Farlie-Gumbel-Morgenstern Copula", "Clayton Copula", and "Renyi's entropy Copula". Three applications to three real data sets are provided to illustrate the flexibility and importance of the new model. Using the approach of the "Bagdonavicius-Nikulin" goodness-of-fit test for right censored validation, we propose a new modified chi-square goodness-of-fit test for a new log-logistic model. The modified goodness-of-fit statistic test is applied for the right censored real dataset of survival times of psychiatric inpatients admitted to the university of Iowa hospitals. Based on the maximum likelihood estimators on initial data, the modified test recovers the loss in information while grouping data and follows chi-square distributions. All elements of the modified criteria tests are explicitly derived and given. Three real data applications are presented for measuring the flexibility and the importance of the new model under the uncensored scheme.

Acknowledgments:
The authors gratefully acknowledge with thanks the very thoughtful and constructive comments and suggestions of the four reviewers which resulted in much improved paper.

Conflicts of Interest:
The authors declare no conflict of interest.  Table A3. MLEs, SEs and CIs with C AI , C Bayes , C HQ and C CA for the breaking stress of carbon fibers data.  Table A4. MLEs, SEs, and CIs with C AI , C Bayes , C HQ and C CA for the survival times data.    Table A5. MLEs, SEs, and CIs with C AI , C Bayes , C HQ and C CA for the leukemia data.