Modeling Extreme Values Utilizing an Asymmetric Probability Function

: In this article, a new ﬂexible probability density function with three parameters is proposed for modeling asymmetric data (positive and negative) with different types of kurtosis (mesokurtic, leptokurtic and platykurtic). Some of its statistical and reliability properties, including hazard rate function, moments, moment generating function, incomplete moments, mean deviations, moment of the residual life, moment of the reversed residual life, and order statistics are derived. Its hazard rate function can be either constant, increasing-constant, decreasing-constant, U shape, upside down shape or upside down-U shape. Seven classical estimation methods are considered to estimate the unknown model parameters. Monte Carlo simulation experiments are performed to compare the performance of the seven different estimation methods. Finally, a distinctive asymmetric real data application is analyzed for illustrating the ﬂexibility of the new model.


Introduction
Recently, Nadarajah and Haghighi [1] presented and studied a new lifetime model and pointed out that its probability density function (PDF) has a zero mode. A random variable (RV) Z is said to have Nadarajah and Haghighi (NH) model if its survival function (SF) and PDF are given by and g c (z) = c(1 + z) c−1 e 1−(1+z) c ; z ≥ 0, c > 0, (2) respectively, where c is a shape parameter. Nadarajah and Haghighi [1] considered (2) as an alternative to the exponential (Exp), gamma (Gam), Weibull (W) and exponentiatedexponential (ExpExp) distributions. Several extensions of the NH model can be cited, such as the exponentiated NH (E-NH) model by Lemonte [2], the Gam NH (Gam-NH) and Poisson Gam NH (PGam-NH) by Ortega et al. [3], transmuted NH (Tr-NH) by Ahmed et al. [4], Kumaraswamy NH (Kuw-NH) by Lim [5], modified NH (Mo-NH) by El-Damcese and Ramadan [6], Marshall-Olkin NH (MO-NH) by Lemonte et al. [7], Topp-Leone NH (TL-NH) by Yousof and Korkmaz [8], beta NH (B-NH) by Dias [9], inverted NH (I-NH) by Tahir et al. [10], and NH Lindley (NH-L) by Pena et al. [11]. The PDF and cumulative distribution function (CDF) of the Topp-Leone exponentiated-G(TLE-G) family are given by and respectively, where G Ω (z) = 1 − R c (z) is the CDF of any baseline model and g Ω (z) = dG Ω (z)/dx is the PDF of any baseline model. For b = 1, we get the TL family. By inserting (1) and (2) into (3), we can write the PDF of the TLE-NH model as where H γ 1 ,γ 2 (z) = 1 − e 1−(1+z) γ 2 γ 1 . After a quick study of TLE-NH properties, different classical estimation methods under uncensored schemes are considered, such as the maximum likelihood (ML), Anderson-Darling (AD), ordinary least squares (OLS), Cramérvon Mises (CVM), weighted least squares (WLS), left-tail Anderson-Darling (LTAD), and right-tail Anderson-Darling (RTAD) methods. Numerical simulations are performed for comparing the estimation approaches using different sample sizes for three different combinations of parameters. The corresponding CDF is given by For b = 1, the TLE-NH reduces to the TL-NH (see [8]). We provide some plots of the PDF and hazard rate function (HRF) of the TLE-NH model to show its flexibility. The CDF in (6) can be expressed as where C (ς 1 ) = (−1) ς 1 1 2 ς 1 −a a ς 1 and H ς • 1 ,c (z) = G ς • 1 (z; c) is the CDF of the E-NH model with power parameter ς • 1 . The corresponding TLE-NH density function can be formulated as where h ς • 1 ,c (z) is the E-NH PDF with power parameter ς • 1 . Figure 1a displays some plots of the TLE-NH density for some parameter values of a, b and c. The plots of the HRF of the TLE-NH model for some parameter values of a, b and c are obtained in Figure 1b. bles 1 and 2 and Figures 2 and 3). ix. The estimation persuaders of the TLE-NH model can be performed under the maximum likelihood, Anderson-Darling, ordinary least squares, Cramér-von Mises, weighted least squares, left-tail Anderson-Darling, and right-tail Anderson-Darling methods. Although all estimation methods perform well, the weighted least squares estimation method is the best in real data modeling, with slight differences in results.   Figure 1a shows that the PDF of the new version has right skew tails with different shapes whereas Figure 1b shows that the HRF of the TLE-NH has many important failure rates such as "constant (a = 1, b = 1 and c = 1)", "increasing-constant (a = 1, b = 2 and c = 1)", "decreasing-constant (a = 0.01, b = 0.5 and c = 1)", "U shape (a = 1, b = 0.5 and c = 1.4)", "upside down shape (a = 3, b = 1 and c = 1)" and "upside down-U shape (a = 3.25, b = 1 and c = 1.4)".

Moments
We are motivated to introduce and study the TLE-NH model for the following reasons: i.
The new density in (5) can be "asymmetric unimodal and right skewed" with many useful shapes. ii. The HRF of the new model can be constant, increasing-constant, bathtub (U-HRF), decreasing-constant and upside-down (reversed U-HRF). These characteristics give a great advantage to the TLE-NH model for analyzing the data sets in which its HRF can be constant, increasing-constant bathtub, decreasing-constant or upside downbathtub. iii. The new TLE-NH model is recommended for modeling the remission times (in months) of the bladder cancer patients. The bladder cancer data have some extreme values. iv. Also, its nonparametric Kernel density estimation is asymmetric with heavy tail to the right. Therefore, the TLE-NH model could be useful asymmetric real data especially the unimodal symmetric heavy tailed right skewed data and the bimodal symmetric heavy tailed right skewed data. v.
On the other hand, the new TLE-NH model is flexible enough to exhibit the asymmetric densities and right heavy tail shapes as illustrated in Figure 1a. vi. Moreover, the HRF of the bladder cancer patients is upside-down; this property matches with our new model which contains the upside-down HRF as illustrated in Figure 1b. It is vital to mention that the presented class of probabilistic distributions suitable for modeling asymmetric data also have an important utilization in insurance (see Maciak et al. [12] for more details) and in dependence modeling (see Gijbels et al. [13]). vii. The range of the skewness of the TLE-NH model is falling in the interval (−0.92852, 43.03816 ). However, the skewness of the standard baseline NH model is falling in the interval (0.27557, 2). The wide range of the skewness gives priority to the TLE-NH model in modeling and future prediction since many real-life datasets are negatively skewed. The standard baseline NH model cannot be useful in such cases (see Tables 1 and 2 and Figures 2 and 3). viii. The kurtosis of the TLE-NH model is located between 3.03238 and 2722.165, however the kurtosis of the NH model starts from 3.23056 to 9. Thus, the TLE-NH extension could be useful for mesokurtic, leptokurtic and platykurtic data sets (see Tables 1 and 2  weighted least squares, left-tail Anderson-Darling, and right-tail Anderson-Darling methods. Although all estimation methods perform well, the weighted least squares estimation method is the best in real data modeling, with slight differences in results.

Moments
The q th ordinary moment of Z is given by where V and Γ(c, z) = ∞ z z c−1 e −z dz refers to the complementary incomplete Gamma function. The moments in (9) reduce to (∀ ς • 1 > 0 integer) when q = 1 in (9) and (10), we have the mean of Z. The q th central moment of Z, say µ q,Z , is Table 1 lists the expected value (E(Z)), variance (V(Z)), skewness (S(Z)) and kurtosis (K(Z)) for the TLE-NH model, whereas Table 2

Moment Generating Function (MGF)
The MGF M Z (τ) = E e τZ of Z can be derived from Equation (9) or (10) as

Incomplete Moments (I-Ms)
The s th I-M, say I s,Z (τ), of Z can be expressed from (8) as I s,

The m th Moment of the Residual Life (MoRL)
The m th MoRL can be formulated as Then, the m th MoRL of the TLE-NH model can be reported by The life expectation at age τ can be defined by which represents the additional expected life length for a certain unit which is alive at age τ.

The m th Moment of the Reversed Residual Life (M m (τ))
The m th moment of the reversed residual life can be expressed as Then, the m th MoRRL of the TLE-NH model can be formulated as The mean inactivity time (MIT) is given by which is the elapsed waiting time since the failure of a certain subsystem occurred in (0, τ).

Order Statistics
Let Z 1 , Z 2 , . . . , Z m be an observed random sample (RS) from the TLE-NH model and let Z (1:m) , Z (2:m) , . . . , Z (m:m) be the corresponding order statistics. Then the PDF of ς th order statistic can be written as where B(·, ·) is the beta function. Substituting (5) and (6) in (11), the PDF of z [ς:m] can be expressed as where V

Estimation Methods
We discuss seven methods to estimate the parameters of the TLE-NH model which can be implemented using the "AdequacyModel" script in "R" software, which provides a general meta-heuristic optimization technique for maximizing or minimizing an arbitrary objective function. The major aim of using various estimation approaches is to get the best estimators for good analytics, for instance Eliwa et al. [14], El-Morshedy et al. [15], Hamedani et al. [16] and Elgohari et al. [17], among others.

Maximum Likelihood Estimation (MLE) Method
Let Z 1 , Z 2 , . . . , Z m be any observed RS from the new TLE-NH model. The log likelihood function (m) [Ω] for Ω may be expressed as Following the norm routine of parameter estimation for the MLE of a, b and c, we differentiate (m) [Ω] with respect to a, b and c to obtain the score vector as follows Setting U (a) = U (b) = U (c) = 0 and solving them simultaneously yields the MLE of Ω.

Cramér-Von-Mises Estimation (CVME) Method
The CVME of the parameters a, b and c are obtained via minimizing the following expression with respect to (WRT) to the parameters a, b and c, respectively, where ε (ς,m) = [(2ς − 1)/2m] and The CVME of the parameters a, b and c are obtained by solving the three following non-linear equations

Ordinary Least Squares Estimation (OLSE) Method
Let F Ω z [ς:m] denote the CDF of TLE-NH model and let Z 1 < Z 2 < . . . < Z m be the m ordered RS. The OLSEs are obtained upon minimizing

Anderson-Darling Estimation (ADE) Method
The ADE are obtained by minimizing the function

Right Tail-Anderson-Darling Estimation (RT-ADE) Method
The RTADE is obtained by minimizing

Left Tail-Anderson-Darling Estimation (LT-ADE) Method
The LTADE is obtained by minimizing The parameter estimates can be derived by solving

Simulation for Comparing Various Estimation Methods
Simulation studies are performed to compare and assess the above-mentioned estimation methods. The simulation studies are based on N = 1000 generated data sets from the TLE-NH version, where n = 50, 100, 150, 200 and a = 3.0, b = 0.3 and c = 0.1. The performance of the different estimators is compared in terms of the average of its estimates AV (Ω) and mean-standard error MSE (Ω) . The confidence intervals 95% (Lower CI(LCI), Upper CI(UCI)) have been also calculated. Tables 3-5 list the simulation  results. From Tables 3-5, it is noted that the MSE(Ω) tend to zero and A-Vs tend to initial values when n increases, which means the incidence of consistency property. For more illustration and based on Table 3, we have the following results, For a = 3:
The MSE under AD decreased from 0.19796 | n=50 to 0.05345 | n=200 . vi. The MSE under RT-AD decreased from 0.27743 | n=50 to 0.07353 | n=200 . vii. The MSE under LT-AD decreased from 0.18347 | n=50 to 0.04942 | n=200 .   Similar results are recorded regarding the other two parameters.

For Comparing Methods under Asymmetric Data
For comparing the classical methods, an application to a real data set is analyzed. We consider the Cramér-Von Mises (CM) and the Anderson-Darling (AD) statistics. The real data set represents the remission time (in months) of an RS of 128 bladder cancer patients (0.08, 2.09, 3 [18]). Table 6 lists the different estimators as well as CM and AD.
From Table 6, the WLS method is the best method, with CM = 0.03673 and AD = 0.24138, among all estimation techniques; however, MLE, CVM, OLS, ADE, RT-ADE and LT-ADE performed well. Figure 4 shows the probability-probability (P-P) plots for comparing estimation methods. Figure 5 shows the estimated CDF (ECDF) plots for comparing estimation methods. Figure 6 provides Kaplan-Meier estimation plots for comparing estimation methods. Figures 4-6 ensures the results obtained in Table 6. ing estimation methods. Figure 6 provides Kaplan-Meier estimation plots for compari estimation methods. Figures 4-6 ensures the results obtained in Table 6.

For Comparing Competitive Models
An application is present, based on the data set of Cordeiro et al. [18], to show the flexibility of the TLE-NH model. We compare the TLE-NH model with some competitive models such as the Burr type-XII NH (BuXII-NH) ( [19]), Lomax NH (Lx-NH) (Selim [19]), exponentiated exponential (Exp-Exp) beta exponential (B-Exp), Kumaraswamy exponential, TL-NH, inverse generalized power Weibull (IGPW) ( [20]), inverse NH (I-NH), inverse Weibull (IW), inverse Rayleigh (IR), inverse exponential (IE) and NH distributions. Selecting the best model is performed using the estimated −log-likelihood, Akaike-Information-Criterion (AIc), Consistent-Akaike-Information-Criteria (CAIc), Bayesian-Information-Criterion (BIc), and Hannan-Quinn Information-Criterion (HQIc). This data has a unimodal HRF-shape. The results of this application are listed in Tables 7 and 8. Table 7 lists the MLEs and the standard errors (SEs) for the asymmetric real data. Table 8 lists the statistics for the asymmetric real data. These results show that the TLE-NH distribution has the lowest AIc, CAIc, BIc and HQIc values among all the fitted models. Hence, it could be chosen as the best model under these criteria. Figure 7 gives the total time in test (TTT), box, quantile-quantile (QQ) and nonparametric Kernel density estimation (NKDE) plots for the real data. Figure 8 shows the estimated PDF (EPDF), ECDF, EHRF and Kaplan-Meier estimation plots. Clearly, the TLE-NH distribution provides a closer fit to the empirical functions. For this data, we have the following results: E(Z) = 12.21692, V(Z) = 91.23032, S(Z) = 4.546743 and K(Z) = 38.4308.    x FOR PEER REVIEW 18 of 19 Figure 8. ECDF, EPDF, EHRF and Kaplan-Meier estimation plots.

Conclusions
In this paper, we have introduced a new flexible extension to the Nadarajah and Haghighi model called the Topp-Leone exponentiated Nadarajah and Haghighi model (TLE-NH). The PDF of the TLE-NH model can be expressed as a simple linear representation of the exponentiated NH density. Some of its statistical properties have been derived and studied in detail. The HRF can take different shapes, such as constant, increasing-constant, decreasing-constant, bathtub, upside down and upside down-U, which make the TLE-NH model able to analyze different types of data sets in various fields. Moreover, the TLE-NH model can be utilized to discuss both negatively and positively skewed data. The model parameters have been estimated by utilizing various estimation methods. Monte Carlo simulation experiments have been performed to compare the estimation methods. Finally, a real data set is analyzed for illustrating the flexibility of the proposed model, and it is found that the TLE-NH model showed its superiority in modeling the real data set.

Conclusions
In this paper, we have introduced a new flexible extension to the Nadarajah and Haghighi model called the Topp-Leone exponentiated Nadarajah and Haghighi model (TLE-NH). The PDF of the TLE-NH model can be expressed as a simple linear representation of the exponentiated NH density. Some of its statistical properties have been derived and studied in detail. The HRF can take different shapes, such as constant, increasingconstant, decreasing-constant, bathtub, upside down and upside down-U, which make the TLE-NH model able to analyze different types of data sets in various fields. Moreover, the TLE-NH model can be utilized to discuss both negatively and positively skewed data. The model parameters have been estimated by utilizing various estimation methods. Monte Carlo simulation experiments have been performed to compare the estimation methods. Finally, a real data set is analyzed for illustrating the flexibility of the proposed model, and it is found that the TLE-NH model showed its superiority in modeling the real data set.

Data Availability Statement:
The data set is available in Lee and Wang (2003) and given in Section 5.1.