The Burr X Pareto Distribution : Properties , Applications and VaR Estimation

In this paper, a new three-parameter Pareto distribution is introduced and studied. We discuss various mathematical and statistical properties of the new model. Some estimation methods of the model parameters are performed. Moreover, the peaks-over-threshold method is used to estimate Value-at-Risk (VaR) by means of the proposed distribution. We compare the distribution with a few other models to show its versatility in modelling data with heavy tails. VaR estimation with the Burr X Pareto distribution is presented using time series data, and the new model could be considered as an alternative VaR model against the generalized Pareto model for financial institutions.


Introduction
The Pareto (P) distribution is very versatile, and a variety of uncertainties can be usefully modelled by it.It has several applications in actuarial science, economics, finance, life testing, survival analysis and telecommunications because of its heavy tail properties.The probability density function (pdf) and cumulative distribution function (cdf) of the P distribution are given (for x > β) by: where β > 0 is a scale parameter and α > 0 is a shape parameter.This distribution is a special form of the Pearson Type VI distribution.Since the P distribution has a reversed-J pdf shape and a decreasing hazard rate function (hrf), it may sometimes be insufficient to model data.Generally, practical problems require a wider range of possibilities for the medium risk, for example when the lifetime data present a bathtub-shaped hrf, such as human mortality and machine life cycles.For this reason, researchers developed various extensions and modified forms of the P distribution to obtain a more flexible model with different numbers of parameters.Some of them can be cited as follows: Exponentiated P (EP) (Stoppa 1990;Gupta et al. 1998), Beta P (BP) (Akinsete et al. 2008), Kumaraswamy P (KwP) (Bourguignon et al. 2013), Kumaraswamy generalized P (Nadarajah and Eljabri 2013), P ArcTan (PAT) (Gómez-Déniz and Calderín-Ojeda 2015), exponentiated Weibull P (Afify et al. 2016) and Weibull P This generator can supply the flexibility of pdf and hrf to any baseline distribution model (Yousof et al. 2016).
In this paper, we introduce a new extended P distribution, called the Burr X Pareto (BXP) model, based on the BX-G family.With this idea, we construct the new BXP distribution as more flexible than the P distribution and provide a comprehensive description of some of its mathematical properties.We prove empirically that the BXP model provides better fits than some extensions and generalizations of the P, some of which have one extra model parameter, and the others have the same number of parameters, by means of two applications to real data.We hope that the new distribution will attract wider applications in reliability, engineering and other areas of research.
The rest of the paper is organized as follows.In Section 2, we define the BXP model.In Section 3, we provide a useful mixture representation for its pdf.In Section 4, we derive some of its general mathematical properties.Some estimation methods of the model parameters are performed in Section 5.In Section 6, simulation results to assess the performance of the proposed maximum likelihood estimation procedure are discussed.In Section 7, we provide two applications to real data to illustrate the importance and flexibility of the new family.Value-at-Risk estimation with the BXP distribution is presented in Section 8. Finally, some concluding remarks are presented in Section 9.

The New Model
In this section, we define the BXP model and provide some plots for its pdf and hrf.The BXP cdf is given by: The pdf corresponding to (3) is given by: Lemma 1 provides random number generations from the BXP and some relations and of the BXP distribution with the well-known Burr X and uniform distributions.
Lemma 1.(a) If a random variable Y follows the Burr X distribution with shape parameter δ and scale parameter one, then the random variable X = β(1 + Y) (1/α) follows the BXP(δ, α, β) distribution.(b) If a random variable Y follows the uniform distribution on [0,1], then the random variable: Proof.The proofs of (a) and (b) are obtained by the transformation method.

Expansions of pdf and cdf
In this section, we provide a very useful linear representation for the BXP density function.If |z| < 1 and b > 0 is a real non-integer, the power series holds: For simplicity, ignoring the dependence of G(x) and g(x) on ξ and applying (5) to (4), we have: Applying the power series to the term exp −(i , Equation (6) becomes: Consider the series expansion: ( Applying the expansion in ( 8) to (7) for the term , Equation ( 7) becomes: where: and: Equation ( 9) reveals that the density of X can be expressed as expansions of the EP densities.Therefore, several mathematical properties of the new family can be obtained by knowing those of the EP distribution.Similarly, the cdf of the BXP family can also be expressed as a mixture of EP cdfs given by: where: is the cdf of the EP family with power parameter 2(j + 1) + k.

Properties
In this section, we will provide some mathematical properties of the BXP distribution.

Moments
The r-th ordinary moment of X is given by µ r = E(X r ) = ∞ −∞ x r f (x)dx.By using Equation ( 9), we obtain: ) is the r-th ordinary moment of EP distribution with power parameter 2(j + 1) + k.The j-th order central moment can be obtained by the following relationship: where µ = E(X).
For the skewness and kurtosis coefficients, we have: The values for mean, variance, β 1 and β 2 for selected values of δ, α and β are shown in Table 1.We can say that the BXP model can be useful for various data modelling in terms of skewness and kurtosis.

Residual and Reversed Residual Life
The n-th moment of the residual life, say . ., uniquely determines F(x).The n-th moment of the residual life of X is given by: Therefore, where: is the incomplete beta function.
The Mean Residual Life (MRL) function or the life expectation at age t defined by follows by setting n = 1 in the last equation.
The n-th moment of the reversed residual life, say for t > 0 and n = 1, 2, . . .uniquely determines F(x).We obtain: Then, the n-th moment of the reversed residual life of X becomes: The mean inactivity time (MIT) or mean waiting time is given by M and it can be obtained easily by setting n = 1 in the above equation.

Order Statistics
Order statistics make their appearance in many areas of statistical theory and practice.Let X 1 , . . ., X n be a random sample from the BXP of distributions, and let X (1) , . . ., X (n) be the corresponding order statistics.The pdf of the i-th order statistic, say X i:n , can be written as: Using (3), ( 4) and (10), we get: t w,k π 2(w+1)+k (x), where: The pdf of X i:n can be expressed as: Then, the density function of the BXP order statistics is a mixture of EP densities.Based on the last equation, we note that the properties of X i:n follow from those properties of Y 2w+k+2 .For example, the moments of X i:n can be expressed as:

Estimation Methods
In this section, we consider the maximum likelihood, least square and weighted least square estimation of the parameters of the BXP distribution.

Maximum Likelihood Estimation
We consider the estimation of the unknown parameters of the BXP model from complete samples by the maximum likelihood method.The maximum likelihood estimators (MLEs) of the parameters of the BXP (δ, α, β) model are now discussed.Let x 1 , . . ., x n be a random sample of this distribution with parameter vector Θ = (δ, α, β) .The log-likelihood function for δ is given by: where The last equation can be also maximized either by using the different programs such as R (optim function), SAS (PROC NLMIXED) or by solving the nonlinear likelihood equations obtained by differentiating .We note that since x ∈ (β, ∞), the MLE of the β parameter cannot be obtained in the usual way.Hence, the MLE of β is the first order statistic X (1) (Johnson et al. 1994).
The components of the score vector, , are: and: where: For fixed β, the interval estimation of the model parameters requires the 2 × 2 observed information matrix J(Θ) = {J ij } for i, j = δ, α.The multivariate normal N 2 (0, J( Θ) −1 ) distribution, under standard regularity conditions, can be used to provide approximate confidence intervals for the unknown parameters, where J( Θ) is the total observed information matrix evaluated at Θ.Then, approximate 100(1 − δ)% confidence intervals for δ and α can be determined by: δ ± z ζ/2 J δδ and α ± z ζ/2 J αα , where z ζ/2 is the upper ζ-th percentile of the standard normal model.

Ordinary and Weighted Least Squares
In this section, we use the least square (LS) and weighted least square (WLS) estimators (Swain et al. 1988) to estimate the parameters of the BXP distribution.Let X (1) , . . ., X (n) be the order statistics of a random sample of size n from the BXP defined in (4), then the least square estimators (LSEs) of the unknown parameters δ, α and β of the BXP distribution can be obtained by minimizing: with respect to unknown parameters δ, α and β.
The weighted least square estimators (WLSEs) of the unknown parameters δ, α and β follow by minimizing: with respect to unknown parameters δ, α and β.

Simulation Study
Here, we perform the simulation study for MLEs of the BXP distribution.We generate N = 1000 samples of sizes n = 50, 100, 200 from selected BXP distributions.The random numbers generation is simulated by: , where u is a uniform random number on [0,1].We also calculate the empirical mean, standard deviations (sd), bias and mean square error (MSE) of the MLEs.The empirical bias and MSE are calculated by: respectively, where h = (δ, α, β).All results of MLEs were obtained using the optim-CG routine in the R programme.The empirical results of this simulation study are reported in Table 2. Table 2 shows that when the sample size increases, the empirical means approach the true parameter value.For the same case, the standard deviations, biases and MSEs decrease in all the cases as expected.Therefore, the MLE method works very well to estimate the model parameters of the BXP distribution.

Real Data Modelling
In this section, we present two applications based on the real datasets to show the flexibility of the BXP distribution.The BXP model is compared with the WP, BP, KwP, PAT and P distributions.The cdfs of the above distributions are given (for x > β and a, α, δ > 0) by: and: In order to see the best model, we obtain the Akaike Information Criteria (AIC), Corrected Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), Hannan-Quinn Information Criterion (HQIC) and Kolmogorov-Smirnov (KS) goodness of-fit statistic to see the fitting of the models to dataset.In general, the best model can be chose as the one that has the smallest values of the AIC, CAIC, BIC, HQIC and KS statistics.All computations of the MLEs are performed by the maxLik routine in the R program.
The second data-set shows the time intervals of the successive earthquakes in the last century in the North Anatolia fault zone between 39.00 • to 42.00 • north latitude and 39.00 • to 40.00 • east longitude.This dataset was introduced and analysed by Kuş (2007).This dataset is well known as being decreasing hrf-shaped.
For both datasets, the estimated parameters based on the MLE method are given in Table 3, whereas the values of the information criteria and goodness-of-fit statistics are given in Table 4. Since MLE of the β equals the minimum order statistics, we suppose it as known to be the minimum value the dataset.Table 4 shows that the BXP distribution has the lowest values of these statistics among all the fitted models.Hence, it could be chosen as the best model under these criteria for both datasets.
The histogram of these datasets and the estimated pdfs and cdfs of the application models are displayed in Figures 2 and 3. From the this figure, we show that the BXP model provides the best fit to these datasets as compared to other models.

Value-at-Risk Estimation with the BXP Distribution
In this section, the performance of BXP distribution in estimating Value-at-Risk (VaR) is discussed and compared with the Generalized P (GP) distribution.GP is a widely-used distribution in actuarial sciences, economics and statistics to model the tail of the distribution that contains extreme events.VaR is one of the most popular approaches to measure market risk.From a statistical point of view, the VaR entails the estimation of the quantile of the distribution of returns.The VaR for a long position (left tail of the distribution function) over a given time horizon tis defined as: where F is the distribution function of financial losses, F −1 denotes the inverse of F and p is the quantile at which VaR is calculated.
The Peaks-Over-Threshold (POT) method is used to model the tail of the distribution.POT is based on the distribution of exceedances over a given threshold.The conditional excess distribution, F u , can be defined as follows: where random variable X represents the financial losses, u is the threshold, y = X − u are the excesses and x F ≤ ∞ is the right endpoint of F. F u (y) can be re-defined as follows: The Balkema and De Haan (1974) and Pickands (1975) theorem shows that for a sufficiently high threshold u, the excess distribution function F u can be approximated by the GP distribution: where y ≥ 0 for ξ ≥ 0 and 0 ≤ y ≤ σ ξ for ξ < 0 and ξ and σ are shape and scale parameters of the GP distribution, respectively.Isolating F(x) from ( 14), we get: where F u (y) is the GP distribution and F(u) = (n − N u ) n.Then, substituting ( 14) in ( 16), the following estimate for F(x) is obtained: where ξ and σ are maximum likelihood estimates of ξ and σ, respectively.Inverting (17) for a given probability p, VaR p can be obtained as: Threshold selection is a difficult task and an essential part for tail modelling with the GP distribution.The most used method is the Mean Excess (ME) plot for the determination of the threshold.The ME function can be defined as follows: where I is the indicator function.When the empirical ME function is a positively sloped straight line above a certain threshold u, it is evidence that the used dataset follows the GP distribution with a positive ξ parameter.
Here, the BXP distribution is adopted in the POT method.It is assumed that BXP provides a good approximation to F u (y) for a sufficiently high threshold u.Then, substituting the cdf of BXP in ( 16), the new estimate for F(x) can be obtained as: The VaR p can be obtained by inverting (20) for a given probability p, as follows: where β, δ and α are the maximum likelihood estimates of β, δ and α, respectively.

S&P-500
To evaluate and compare the performance of the BXP with GP distribution in terms of VaR accuracy, the S&P-500 index is used.The used time series data contain 1465 daily log returns from 4 January 2012 to 27 October 2017.The descriptive statistics of S&P-500 are given in Table 4.
Table 5 shows that the mean returns are closed to zero.The Jarque-Bera statistics in Table 5 also show that the null hypothesis of normality is rejected at any level of significance, as evidenced by the high excess kurtosis and negative skewness.Thus, it is clear that log returns of S&P-500 indexes have non-normal characteristics, excess kurtosis and fat tails.The result of the Ljung-Box test indicates that the raw returns are free from autocorrelation.Therefore, BXP and GP distributions could be applied to the independent and identically distributed observations.The ME plot is used to determine the optimal threshold value for the POT method.Figure 4 displays the ME plot of the S&P-500 dataset.The optimal threshold could be chosen as 0.02 for the used dataset.It is near the 90% quantile value of the S&P-500.Table 6 shows the estimated parameters of BXP distribution and GP distribution using the POT method for the S&P-500 dataset.Based on the figures in Table 6, we conclude that since the BXP distribution has the lowest values of these statistics, BXP provides better fits than the GP distribution for tail modelling of S&P-500 indexes.Figure 5 displays the fitted pdf and cdfs of the BXP and GP distributions. Figure 5 reveals that the BXP distribution provides superior fits to the used dataset.Here, VaR is estimated with the GP and BXP distribution using the POT method for values of p = 0.95, 0.975 and 0.99.The rolling window estimation method is used to evaluate the out-of-sample performance of the GP and BXP models.The first 1064 daily returns are used as the window length, and the next 400 data points are considered as out-of-sample period.Figure 6 displays daily VaR estimates of the BXP and GP models.Based on Figure 6, it is clear that the BXP and GP models produce similar VaR estimates.Therefore, the BXP model could be considered as an alternative VaR model against to GP model for financial institutions.
In VaR estimation, using the POT method is applied to raw return data assuming the distribution to be stationary or unconditional without considering the time-varying volatility.The POT method can also be considered as a dynamic model, where the conditional distribution of F is taken into account and the volatility of returns is captured.The dynamic POT method based on the BXP distribution, combined with the generalized autoregressive conditional heteroscedasticity type process, introduced by Bollerslev (1986), could be considered as future work of this study.

Conclusions
In this study, we proposed a new distribution that was referred to as the Burr X Pareto (BXP) using the Burr X generator.Some mathematical properties were obtained.The estimation of the model parameters is performed by the MLE, LS and WLS methods.We compare the distribution with a few other models using two real datasets.It is expected that the BXP distribution will serve as a better alternative in modelling real-life datasets.Value-at-Risk estimation with the BXP distribution is presented using time series data, we showed that the new model could be considered as an alternative VaR model against the generalized Pareto model for financial institutions.

Figure 3 .
Figure 3. Fitted pdfs (left panel) and cdfs (right panel) of earthquake data.

Figure 5 .
Figure 5. Fitted pdfs (left) and cdfs (right) of the BXP and GP distribution for the S&P-500 dataset.

Figure 6 .
Figure 6.Daily VaR estimates of the BXP and GP models.

Table 1 .
Mean, variance, coefficients of skewness and kurtosis for different values of parameters.

Table 5 .
Summary statistics for the S&P-500 index.

Table 6 .
MLEs, corresponding standard errors (in second line) and goodness-of-fit statistics for the S&P-500.