Next Article in Journal
Characteristic Polynomials and Eigenvalues for Certain Classes of Pentadiagonal Matrices
Next Article in Special Issue
The Heavy-Tailed Exponential Distribution: Risk Measures, Estimation, and Application to Actuarial Data
Previous Article in Journal
Efficiency for Vector Variational Quotient Problems with Curvilinear Integrals on Riemannian Manifolds via Geodesic Quasiinvexity
Previous Article in Special Issue
Estimation of Uncertainty in Mortality Projections Using State-Space Lee-Carter Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Beta-Pareto Distribution Based on Several Optimization Methods

by
Badreddine Boumaraf
1,2,3,†,
Nacira Seddik-Ameur
2,† and
Vlad Stefan Barbu
3,*,†
1
Department of Mathematics and Informatics, University of Souk-Ahras, Souk Ahras 41000, Algeria
2
Laboratory of Probability and Statistics LaPS, University of Badji Mokhtar of Annaba, Annaba 23000, Algeria
3
Laboratory of Mathematics Raphaël Salem, University of Rouen-Normandy, 76801 Saint Étienne du Rouvray, France
*
Author to whom correspondence should be addressed.
The authors contributed equally to this work.
Mathematics 2020, 8(7), 1055; https://doi.org/10.3390/math8071055
Submission received: 28 February 2020 / Revised: 3 June 2020 / Accepted: 10 June 2020 / Published: 1 July 2020
(This article belongs to the Special Issue Probability, Statistics and Their Applications)

Abstract

:
This paper is concerned with the maximum likelihood estimators of the Beta-Pareto distribution introduced in Akinsete et al. (2008), which comes from the mixing of two probability distributions, Beta and Pareto. Since these estimators cannot be obtained explicitly, we use nonlinear optimization methods that numerically provide these estimators. The methods we investigate are the method of Newton-Raphson, the gradient method and the conjugate gradient method. Note that for the conjugate gradient method we use the model of Fletcher-Reeves. The corresponding algorithms are developed and the performances of the methods used are confirmed by an important simulation study. In order to compare between several concurrent models, namely generalized Beta-Pareto, Beta, Pareto, Gamma and Beta-Pareto, model criteria selection are used. We firstly consider completely observed data and, secondly, the observations are assumed to be right censored and we derive the same type of results.

1. Introduction

In this work we are interested in the four-parameter distribution called Beta-Pareto (BP) distribution, introduced recently by Akinsete et al. (2008) [1]. This distribution is a generalization of several models like Pareto, logbeta, exponential, arcsine distributions, in the sense that these distributions can be considered as special cases of the BP distribution, by making transformation of the variable or by setting special values of the parameters (cf. Reference [1]).
It is well known that the family of Pareto distribution and the corresponding generalizations have been extensively used in various fields of applications, like income data [2], environmental studies [3] or economical-social studies [4]. Concerning the generalizations of the Pareto distribution, we can cite the Burr distribution [5], the power function [6] and the logistic distribution. It is important also to stress that heavy tailed phenomena can be successfully modelled by means of (generalized) Pareto distributions [7]. Note that this BP model is based on Pareto distribution which is known to be heavy tailed, as it was shown in Reference [1] using some theoretical and application arguments. We would like to stress that this is important in practice because it can be used to describe skewed data better than other distributions proposed in the statistical literature. A classical example of this kind is given by the exceedances of Wheaton River flood data which are highly skewed to the right.
All these reasons show the great importance of statistically investigating any generalization of Pareto distributions. Thus the purpose of our work is to provide algorithmic methods for practically computing the maximum likelihood estimators (MLEs) of the parameters of the BP distribution, to carry out intensive simulation studies in order to investigate the quality of the proposed estimators, to consider model selection criteria in order to choose among candidate models and also to take into account right-censored data, not only completely observed data. Note that right-censored data are often observed in reliability studies and survival analysis, where experiments are generally conducted over a fixed time period. At the end of the study, generally we cannot observe the failure times of all the products or death (remission) of all patients due to loss to follow-up, competing risks (death from other causes) or any combination of these. In such cases, we do not always have the opportunity of observing these survival times, observing only the censoring times. Consequently, it is of crucial importance for this type of data and applications to develop a methodology that allows the estimation in the presence of right censoring.
Generally, numerical methods are required when the MLEs of the unknown parameters of any model cannot be obtained. For the BP distribution, we propose the use of several nonlinear optimization methods: Newton-Raphson’ method, the gradient and the conjugate gradient methods. For the Newton-Raphson’s method, the evaluation of the approximation of the Hessian matrix is widely studied in the literature. In this work, we are interested in the BFGS method (from Broyden-Fletcher-Goldfarb-Shanno), the DFP method (from Davidon-Fletcher-Powell) and the SR1 method (Symmetric Rank 1). For the conjugate gradient method we use the model of Fletcher-Reeves.
It is well known that the gradient and conjugate gradient methods are better than the Newton-Raphson’s method. Nonetheless, most statistical studies use the Newton-Raphson’s method instead of the gradient or conjugate gradient methods, maybe because it is much more easier to put in practice. Our interest is to put in practice the gradient method and the conjugate gradient method in our framework of BP estimation and also present the Newton-Raphson method for comparison purposes only.
The structure of the article is as follows. In the next section we introduce the BP distribution and we give some elements of MLE for such a model. Section 3 is devoted to the numerical optimization methods used for obtaining the MLEs for the parameters of interest. Firstly, we briefly recall the numerical methods that we use (Newton-Raphson, gradient and conjugate gradient). Secondly, we present the corresponding algorithm for the method of conjugate gradient, which is the most complex one. We end this section by investigating through simulations the accuracy of these three methods. In Section 4 we use model selection criteria (AIC, BIC, AICc) in order to chose between several concurrent models, namely between the Generalized Beta-Pareto, Beta, Pareto, Gamma, BP models respectively. In Section 5 we assume that we have at our disposal right-censored observations and we derive the same type of results as we did for complete observations.

2. BP Distribution and MLEs of the Parameters

A random variable X has a BP distribution with parameters α , β , θ , k > 0 , if its probability density function is
f ( x ; α , β , k , θ ) : = k θ B ( α , β ) 1 x θ k α 1 x θ k β 1 , x θ ,
with B ( α , β ) = Γ ( α ) Γ ( β ) Γ ( α + β ) , Γ ( x ) , x > 0 , being the Gamma function. Consequently, the support of X is [ θ , + ) . The corresponding cumulative distribution function can be written as
F ( x ; α , β , k , θ ) = 1 x θ k β β B ( α , β ) x θ k β B ( α , β ) n = 1 i = 1 n ( i α ) n ! ( β + n ) x θ k n , x θ , α , β , θ , k > 0 .
In Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5, the pdf, survival, cdf, hazard and cumulative hazard functions are presented for several values of the parameters.
Let us now consider an i.i.d. (independent and identically distributed) sample ( x 1 , x 2 , x 3 , , x n ) of a random variable X following a BP distribution with density f ( x ; α , β , k , θ ) . The corresponding log-likelihood function can be written as
l ( α , β , k , θ x 1 , x 2 , x 3 , , x n ) = n ln k n ln θ + n ( ln Γ ( α + β ) ln Γ ( α ) ln Γ ( β ) ) + ( α 1 ) i = 1 n ln 1 x i θ k ( k β + 1 ) i = 1 n ln x i θ .
As usual, when we need to see the log-likelihood function as a function of the parameters α , β , k , θ only, we shall write l ( α , β , k , θ ) instead of l ( α , β , k , θ x 1 , x 2 , , x n ) . The score equations are thus given by
l ( α , β , k , θ ) α = n Ψ ( α + β ) Ψ ( α ) + i = 1 n ln 1 x i θ k = 0 ,
l ( α , β , k , θ ) β = n [ Ψ ( α + β ) Ψ ( β ) ] k i = 1 n ln x i θ = 0 ,
l ( α , β , k , θ ) k = n k i = 1 n β + ( α 1 ) 1 x i θ k 1 ln x i θ = 0 ,
where the function Ψ is defined by Ψ ( x ) : = Γ ( x ) Γ ( x ) , x > 0 .
As x θ , the MLE of the parameter θ is the first order statistic x ( 1 ) and the MLEs of α , β and k are obtained by solving the system (2)–(4).
Using Equations (2)–(4) we compute the elements of the Fisher information matrix as follows [1]:
2 l ( α , β , k , θ ) α 2 = n [ Ψ ( α + β ) Ψ ( α ) ] , 2 l ( α , β , k , θ ) α β = n Ψ ( α + β ) , 2 l ( α , β , k , θ ) k β = i = 1 n ln x i θ , 2 ( l ( α , β , k , θ ) ) k α = i = 1 n ln x i θ x i θ k 1 1 , 2 ( l ( α , β , k , θ ) ) k 2 = n k 2 ( α 1 ) i = 1 n x i θ k ln x i θ 1 x i θ k 2 , 2 ( l ( α , β , k , θ ) ) β 2 = n [ Ψ ( α + β ) Ψ ( β ) ] .
There are no closed formulas for solving these equations, so numerical methods are required.

3. Numerical Optimization for Solving the Score Equations

In this section we are interested in proposing numerical methods for solving the score equations of the MLEs of the BP parameters. We will consider three methods: the Newton-Raphson’s method, the gradient method and the conjugate gradient method. Firstly, we will recall the general lines of these optimization methods. Secondly, we will give a detailed description of the application of the most complex method, namely of the conjugate gradient one, for computing the MLEs of the BP distribution. We close the section by presenting numerical results of the implementation of these methods in our BP framework.

3.1. General Description of Optimization Methods

Les us first recall the main lines of the three numerical methods that we have considered. The objectives of these methods will be to minimize/maximize a function l : R p R ; in our case, p = 3 (the three parameters of the BP distribution to be estimated, α , β , k). As it was mentioned in the previous section, although the number of parameters of the BP distribution is 4 (the parameters are α , β , k and θ ), the MLE of the forth parameter θ can be immediately obtained as the first order statistics, θ ^ = x ( 1 ) . For this reason, although the function l has 4 arguments, in the optimization methods only 3 will be in fact involved.

3.1.1. Newton-Raphson’s Method

Newton’s method: Let us denote by V : = ( α , β , k , θ ) a current point of the function l , let l ( V k ) be the gradient and H K be the Hessian function at an iteration V k of the current point V . Newton’s method consists in taking the descent direction
d k = H k 1 l ( V k ) .
Near-Newton method: Often in practice, the inverse of the Hessian, H K 1 , is very difficult to evaluate when the function l is not analytic. The gradient is always more or less accessible (by inverse methods). As the Hessian cannot be computed exactly, we try to evaluate an approximation. Among the methods that approximate the Hessian, three are retained here: the method B F G S (for Broyden-Fletcher-Goldfarb-Shanno), the method D F P (for Davidon-Fletcher-Powell) and the method S R 1 (for Symmetric Rank 1 method) [8].
We introduce the notation s k = V k + 1 V k and y k = l ( V k + 1 ) l ( V k ) and we choose H 0 a definite positive matrix; for convexity reasons the identity matrix is usually chosen.
Update of the Hessian by the method BFGS: In this approach, the approximation of the Hessian is given by
H k + 1 = H k + y k y k T y k T s k H k s k s k T H k s k T H k s k .

3.1.2. Gradient Method

This algorithm is based on the fact that, in the vicinity of a point V, the function l decreases most strongly in the opposed direction of the gradient of l ,
d = l ( V ) .
Let us fix an arbitrary ϵ > 0 . The algorithm can be described as follows:
  • Step 0 (Initialization): V 0 = ( α 0 , β 0 , k 0 , θ 0 ) that satisfies the conditions for the parameters of the BP distribution; set k = 0 and go to Step 1
  • Step 1: Computation of d k = l ( V k ) ; if l ( V k ) ϵ , then STOP; If not, go to Step 2.
  • Step 2: Compute λ k the solution of the problem
    min λ R + l ( V k + λ d k )
    and V k + 1 ,
    V k + 1 = V k λ k l ( V k ) .
    Set k = k + 1 and go to Step 1.
The algorithm that uses this direction of descent is called the gradient algorithm or the deepest descent algorithm. The algorithm generally requires a finite number of iterations, depending on the regularity of the function l . In practice, we often observe that l ( V ) is a good direction of descent, but the convergence to the solution is generally slow. Despite its poor numerical performance, this algorithm is worth studying.

3.1.3. Conjugate Gradient Method

The conjugate gradient method is one of the most famous and widely used methods for solving minimization problems. It is mostly used for big size problems. This method was proposed in 1952 by Hestenes and Steifel for the minimization of strictly convex quadratic functions [9].
Many mathematicians have used this method for the nonlinear (non-quadratic) case. This was done for the first time in 1964 by Fletcher and Reeves [10], then in 1969 by Polak and Ribiere [11]; another variant was studied in 1987 by Fletcher [12]. The strategy adopted was to use a recursive sequence
V k + 1 = V k + λ k d k ,
where λ k is a properly chosen positive real constant called “step” and d k is a non-zero real vector called “direction”. As the conjugate gradient algorithm is used to solve nonlinear functions, we should note that nonlinear conjugate gradient methods are not unique.
Algorithm for the conjugate gradient method:
The Algorithm is initialized by the step 0 of the simple gradient.
0. We have d 0 = l ( V 0 ) as long as a convergence criterion is not verified.
1. Determination of a step λ k by some linear search method. Computation of a new iteration
V k + 1 = V k + λ k d k .
2. Evaluation of a new gradient
l ( V k + 1 ) .
3. Computation of the real ω k + 1 by some methods, for example Fletcher and Reeves (see below),
ω k + 1 = l ( V k + 1 ) T l ( V k + 1 ) l ( V k ) T l ( V k ) .
4. Construction of a new descent direction:
d k + 1 = l ( V k + 1 ) + ω k + 1 d k .
5. Increment
k = k + 1 .
Several methods exist for computing the term ω k + 1 ; we will be concerned in the sequel by the methods of Fletcher-Reeves, of Polack-Ribiere, and of Hesteness-Stiefel (a variant of the method of Fletcher-Reeves). It should be noted that this last method is particularly effective in the case when the function is quadratic and when the linear search is carried out exactly. The corresponding ω k + 1 for these methods is given by:
Fletcher-Reeves
ω k + 1 = l ( V k + 1 ) T l ( V k + 1 ) l ( V k ) T l ( V k ) ;
Polack-Ribiere
ω k + 1 = l ( V k + 1 ) T ( l ( V k + 1 ) l ( V k ) ) l ( V k ) T l ( V k ) ;
Hestenes-Stiefel
ω k + 1 = l ( V k + 1 ) T ( l ( V k + 1 ) l ( V k ) ) ( l ( V k + 1 ) l ( V k ) ) T l ( V k ) .
Noting that
l ( V k + 1 ) = l ( V k ) + ( V k + 1 V k ) d l d V k ( V k ) + R e s t ( V k + 1 V k ) ,
we have the recurrence
V k + 1 = V k + λ k d k ,
where the step of descent λ k is obtained with an exact or inaccurate linear search. The descent vector d k is obtained by using a conjugate gradient algorithm based on the recurrence formula
d k = l ( V k ) ω k d k 1 ,
where
ω k = l ( V k ) 2 l ( V k 1 ) 2 .

3.2. Optimization Methods for Computing the MLEs of the BP Distribution

At this stage, we are interested in describing the application of the three optimization methods for computing the MLEs of the BP distribution. In the sequel we will present in details only the most complex one, namely the conjugate gradient method. Following the same line, the other two methods can also be adapted for computing the MLEs of the BP distribution. Note also that we will give in the next section numerical results for all three methods.
So, our purpose is to numerically determine the MLEs of the BP parameters by calculating the maximum of the log-likelihood function l ( α , β , k , θ ) given in (1).
As we have previously mentioned, the MLE of the parameter θ is the first order statistic, so, in the sequel, the iterates θ 0 = θ 1 = = x ( 1 ) are fixed.
To compute the MLEs of the parameters, we proceed as follows:
1. Step 0: Initialization V 0 = ( α 0 , β 0 , k 0 , θ 0 ) .
2. Step 1: Computation of the gradient
l ( α , β , k , θ ) = l ( α , β , k , θ ) α l ( α , β , k , θ ) β l ( α , β , k , θ ) k = n Ψ ( α + β ) Ψ ( α ) + i = 1 n ln 1 x i θ k n Ψ ( α + β ) Ψ ( β ) k i = 1 n ln x i θ n k i = 1 n β + ( α 1 ) 1 x i θ k 1 ln x i θ .
If l ( α , β , k , θ ) ϵ , then stop, V 0 = ( α 0 , β 0 , k 0 , θ 0 ) is the optimal vector.
Else: go to the next step.
3. Step 2: Computation of the direction of descent
If i = 0 :
d 0 = l ( α 0 , β 0 , k 0 , θ 0 ) = n Ψ ( α 0 + β 0 ) Ψ ( α 0 ) + i = 1 n ln 1 x i θ 0 k 0 n [ Ψ ( α 0 + β 0 ) Ψ ( β 0 ) ] k 0 i = 1 n ln x i θ 0 n k 0 i = 1 n β 0 + ( α 0 1 ) 1 x i θ 0 k 0 1 ln x i θ 0 .
If i > 0 : d i + 1 = l ( α i , β i , k i , θ i ) + ω i d i ; when we use the method of Fletcher-Reeves (5), we will have the result
ω i F R = l ( α i + 1 , β i + 1 , k i + 1 , θ i + 1 ) 2 l ( α i , β i , k i , θ i ) 2 .
4. Step 3: Computation of V 1 = ( α 1 , β 1 , k 1 , θ 1 )
V 1 = V 0 + λ 0 d 0 such as d 0 = l ( α 0 , β 0 , k 0 , θ 0 ) .
Determination of a step λ 0
We can find λ 0 with exact and inaccurate linear search methods. In our case, the use of the exact linear search helps us to have a fast convergence. The exact linear search method is to solve the problem
min λ 0 R + l ( V 0 + λ 0 d 0 ) .
For this, we will look for the value of λ 0 which cancels the first derivative of the function l ( V 0 + λ 0 d 0 ) ,
α 0 λ 0 n Ψ ( α 0 + β 0 ) Ψ ( α 0 ) + i = 1 n ln 1 x i θ 0 k 0 = 0 , β 0 λ 0 n Ψ ( α 0 + β 0 ) Ψ ( β 0 ) k 0 i = 1 n ln x i θ 0 = 0 , k 0 λ 0 n k 0 i = 1 n β 0 + ( α 0 1 ) 1 x i θ 0 k 0 1 ln x i θ 0 = 0 .
Thus we can deduce the exact value of λ 0
λ 0 = ( α 0 + β 0 + k 0 ) / 2 n Ψ ( α 0 + β 0 ) n Ψ ( α 0 ) n Ψ ( β 0 ) k 0 i = 1 n ln x i θ 0 + i = 1 n ln 1 x i θ 0 k 0 + n k 0 i = 1 n β 0 + ( α 0 1 ) 1 x i θ 0 k 0 1 ln x i θ 0 .
If λ 0 is positive, we accept it and we go to next step. If not, we use inexact methods of linear search to have an approximation of the optimal value λ 0 and we go to next step. The main inexact methods are the so-called inaccurate linear search methods of Armijo [13], of Goldstein [14], of Wolfe [15] and of Wolfe strong [16].
Construction of the vector V 1
V 1 = V 0 + λ 0 d 0 = α 0 + λ 0 n { Ψ ( α 0 + β 0 ) Ψ ( α 0 ) } i = 1 n ln ( 1 ( x i θ 0 ) k 0 ) β 0 + λ 0 n { Ψ ( α 0 + β 0 ) Ψ ( β 0 ) } + k 0 i = 1 n ln ( x i θ 0 ) k 0 + λ 0 n k 0 + i = 1 n β 0 + ( α 0 1 ) 1 ( x i θ 0 ) k 0 1 ln ( x i θ 0 ) ,
where λ 0 was previously obtained.
Set i = i + 1 and go to Step 1.
Based on Zoutendijk’s theorem [17] and globally convergent Riemannian conjugate gradient method [18], the convergence of the algorithm is ensured.

3.3. Numerical Results

We carried out an important simulation study using the programming software R. In the sequel we present the results obtained by means of the three optimization methods.
In Table 1, Table 2 and Table 3, the MLEs of the parameters α , β and k of the BP distribution are presented, together with the standard deviations (SDs), considering the three optimization methods. Note the convergence of the estimators obtained with the gradient and conjugate gradient methods and note also that the Newton’s method does not converge with a non-quadratic or nonlinear function.
The Table 4, Table 5 and Table 6 present the MLEs of the parameters for the conjugate gradient method and gradient methods, as well as the bias and the mean square errors (MSEs) of the estimators. Note that the MSEs of the estimators have very small values, smaller than 10 4 .
The interest of the method of conjugated gradient comes from the fact that it converges quickly towards the minimum; we can show that in N dimensions it only needs maximum N calculation steps, if the function is exactly quadratic. The inconvenient of the Newton’s method is that it requires to know the Hessian of the function in order to determine the descent step. We have seen that the conjugate gradient algorithm chooses optimally the descent directions through V ( α , β , k , θ ) .
In Figure 6, Figure 7 and Figure 8 we also present the evolution of the MSEs of the computed estimators. The three numerical methods are carried out for m = 10,000 iterations.
As we see in these figures, the MLEs of the parameters α , β , and k obtained by Newton’s, gradient and conjugate gradient methods are n -consistent. And we can say that the conjugate gradient method gives the best results. In this way, we have numerically checked the well known properties of MLEs, namely the consistency and asymptotic normality, by verifying that the values of the mean square errors obtained are n -consistent, which confirm the theory of the method used.

4. Model Selection

We want also to take into account model selection criteria in order to choose among several candidate models, BP, Beta, Pareto, Gamma and Generalized Beta-Pareto distributions. The Generalized BP (GBP) distribution is a 5-parameter distribution introduced in Reference [19], with a different form given in Reference [20]. The density of this distribution is defined by
g ( x ; α , β , k , θ , c ) : = c B ( α , β ) 1 θ x k c θ 1 1 1 θ x k c β 1 k θ θ x k + 1 , x θ > 0 ,
with the parameters α , β , θ , c , k > 0 .
For the model selection problem, we considered the general information criterion (GIC)
G I C ( K ) = 2 ln L + I × K ,
where: L is the likelihood, K is the number of parameters of the model, I is an index for the penalty of the model. The following well-known criteria are obtained for different values of I:
A I C ( K ) = 2 ln L + 2 K
B I C ( K ) = 2 ln L + ln ( n ) K
and
A I C c ( K ) = A I C ( K ) + 2 K ( K + 1 ) n K 1 ,
where n is the sample size.
We have simulated data according to BP distribution, Pareto (P) distribution, Gamma (G) distribution and GBP distribution. We have computed the three criteria AIC, BIC and AICc and recorded in Table 7, Table 8, Table 9 and Table 10 the number of times in 1000 iterations that each of the models is selected (minimum value of the corresponding criteria).
Several remarks need to be done. First, since the support of a BP distribution is [ θ , + ) , it makes sense to compare data from BP distribution with data from Gamma, Pareto and GBP distributions. For example, when θ is close to 0 , the support of the BP distribution is close to the support of Gamma distribution. BP distribution reduces to Pareto distribution in the case where α = β = 1 , and the support is the same.
Second, note that we have added the Beta (B) distribution as one of the candidate distributions to be chosen by the information criteria. Although this is not a “real” candidate (since the support is ( 0 , 1 ) ), we wanted to check also the model selection criteria in the presence of an “unusual” model. Clearly, it does not make sense to compare data from BP, GBP, Gamma or Pareto distribution, on the one side, with data from Beta distribution, on the other side. Nonetheless, it is important to have an idea about how the model criteria behave in this case.
We can notice that all criteria choose the correct model in all the 4 situations, for moderate or even small values of the sample size.
We can also remark that, when the underlying model is the GBP, for small values of the sample size n (less than 50), the criteria fail in choosing the correct model; nonetheless, starting from reasonable values of the sample size n (around 50 or 100), the correct model is chosen by all criteria in most of the cases. This phenomenon could be generated by the fact that the number of parameters in the GBP model is the highest one, 5, which has an influence on the information criteria.
We have also noticed a strange phenomenon: also when the underlying model is the GBP, for small values of the sample size n , the model preferred by the criteria is the Beta model instead of the GBP, that is to say the“unusual” model, as previously explained. The fact is more accentuated for the BIC criteria, so surely it is related to the penalization due to the number of parameters.
Another remark related to the difference between BP and GBP models needs to be done. When comparing Table 7 and Table 10, we can notice an asymmetry between the model selection criteria when the underlying true models are the BP, respectively the GBP model. On the one side, when the underlying model is the GBP (Table 7), the BP model is chosen by all criteria in 15–20% of cases for small values of the sample size. On the other side, when the underlying model is the BP (Table 10), the GBP model is never chosen. Although we think that this phenomenon could be related to the number of parameters of each model, to the presence of the other models investigated by the criteria as candidate models (Pareto, Beta, Gamma), or to the particular case that we considered in simulations, we do not have a complete explanation for this phenomenon.

5. MLEs for Right-Censored Data

Let us consider that the random right-censorship C is non-informative. Roughly speaking, a censorship is non-informative in the sense that the censoring distribution provides no information about the lifetime distribution. If the censoring mechanism is independent of the survival time, then we will have non-informative censoring. In fact, in practice we almost always mean independent censoring by non-informative censoring. It is well known that if the censoring variable C depends on the lifetime X, we run into the problem of non-identifiability: starting from observations, we cannot make inference about X or C, because several different distributions for ( X , C ) could provide the same distribution for the observation. For these reasons, researchers almost always consider independence between censoring and lifetime. In most practical examples this makes sense; for instance, if the lifetime represents the remission time of patients and the censoring mechanism comes from loss to follow-up (patients change town, for instance), it is natural to consider that we have independence between censoring and lifetime.
In our case, suppose that the variables X and C have respectively the probability density functions f and g and survival functions S and G ; all the information is contained in the couple ( T j , δ j ) where T j = min ( X j , C j ) is the observed time and δ j = 1 ( X j C j ) is the censorship indicator. So, the contribution to the likelihood for the individual j is
L j = f ( t j | V ) G ( t j ) δ j g ( t j ) S ( t j | V ) 1 δ j .
Note that the term f ( t j | V ) G ( t j ) δ j corresponds to the case δ j = 1 , that is to say to the case when the lifetime is observed; in this situation, the contribution to the likelihood of the lifetime X is f ( t j | V ) , while the contribution to the likelihood of the censorship is G ( t j ) . The second term has an analogous interpretation in the case when δ j = 0 , that is to say in the case when the censorship is observed.
We also assume that there are no common parameters between the censoring and the lifetime; consequently, the parameter V does not appear in the distribution of the censorship. The useful part of the likelihood (for obtaining the MLEs of interest, that is, the MLEs of the BP distribution) is then reduced to
L = j = 1 m f ( t j | V ) δ j S ( t j | V ) 1 δ j
and the log-likelihood is
l : = ln L = j = 1 m δ j ln f ( t j | V ) + j = 1 m ( 1 δ j ) ln S ( t j | V ) = j = 1 m δ j ln k θ B ( α , β ) + ( α 1 ) ln 1 t j θ k + ( k β 1 ) ln t j θ + j = 1 m ( 1 δ j ) ln t j θ k β β B ( α , β ) + t j θ β β B ( α + β ) n = 1 i = 1 n ( i α ) n ! ( β + n ) t j θ n k ,
where x θ and α , β , k , θ > 0 . Consequently we get
l = ln k θ j = 1 m δ j m ln ( B ( α , β ) ) + j = 1 m δ j ( α 1 ) ln 1 t j θ k j = 1 m δ j ln t j θ k β j = 1 m ln t j θ + j = 1 m ( 1 δ j ) ln β 1 + n = 1 i = 1 n ( i α ) n ! ( β + n ) t j θ n k .
Setting
z ( t j , α , β , k , θ ) : = β 1 + n = 1 i = 1 n ( i α ) n ! ( β + n ) t j θ n k
and
z α ( t j , α , β , k , θ ) : = z α ( t j , α , β , k , θ ) ,
z β ( t j , α , β , k , θ ) : = z β ( t j , α , β , k , θ ) ,
z k ( t j , α , β , k , θ ) : = z k ( t j , α , β , k , θ ) ,
we obtain the score functions
l ( α , β , k , θ ) α = m Ψ ( α + β ) Ψ ( α ) + j = 1 m δ j ln 1 t j θ k + j = 1 m ( 1 δ j ) z α ( t j , α , β , k , θ ) z ( t j , α , β , k , θ ) l ( α , β , k , θ ) β = m Ψ ( β ) Ψ ( β + α ) k j = 1 m ln t j θ + j = 1 m ( 1 δ j ) z β ( t j , α , β , k , θ ) z ( t j , α , β , k , θ ) l ( α , β , k , θ ) k = k 1 j = 1 m δ j j = 1 m β + δ j ln t j θ 1 t j θ k 1 ln t j θ + j = 1 m ( 1 δ j ) z k ( t j , α , β , k , θ ) z ( t j , α , β , k , θ ) .

5.1. Conjugate Gradient Method for Parameter MLEs of the BP Distribution with Right-Censored Data

As we have already mentioned, the MLE of the parameter θ is the first order statistic. Let us fix an arbitrary ϵ > 0 . The algorithm based on the conjugate gradient that we propose is as follows.
1. Step 0: Initialization V 0 = ( α 0 , β 0 , k 0 , θ 0 ) .
2. Step 1: Computation of the gradient
l ( α , β , k , θ ) = l ( α , β , k , θ ) α l ( α , β , k , θ ) β l ( α , β , k , θ ) k = m Ψ ( α + β ) Ψ ( α ) + j = 1 m δ j ln 1 t j θ k + j = 1 m ( 1 δ j ) z α ( t j , α , β , k , θ ) z ( t j , α , β , k , θ ) m Ψ ( β ) Ψ ( β + α ) k j = 1 m ln t j θ + j = 1 m ( 1 δ j ) z β ( t j , α , β , k , θ ) z ( t j , α , β , k , θ ) k 1 j = 1 m δ j j = 1 m β + δ j ln t j θ 1 t j θ k 1 ln t j θ + j = 1 m ( 1 δ j ) z k ( t j , α , β , k , θ ) z ( t j , α , β , k , θ ) .
If l ( α , β , θ , k ) ϵ , then stop, V 0 = ( α 0 , β 0 , k 0 , θ 0 ) is the optimal vector. If not, go to the next step.
3. Step 2: Computation of the direction of descent
If i = 0 , then
d 0 = l ( α 0 , β 0 , k 0 , θ 0 ) = m Ψ ( α 0 + β 0 ) + Ψ ( α 0 ) j = 1 m δ j ln 1 t j θ 0 k 0 j = 1 m ( 1 δ j ) z α ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) m Ψ ( β 0 ) + Ψ ( β 0 + α 0 ) k 0 j = 1 m ln t j θ 0 j = 1 m ( 1 δ j ) z β ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) k 0 1 j = 1 m δ j + j = 1 m β 0 δ j ln t j θ 0 1 t j θ 0 k 0 1 ln t j θ 0 j = 1 m ( 1 δ j ) z k ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) .
If i > 0 , then d i + 1 = l ( α i , β i , k i , θ i ) + ω i d i ; using the Fletcher-Reeves’ method we get
ω i F R = l ( α i + 1 , β i + 1 , k i + 1 , θ i + 1 ) 2 l ( α i , β i , k i , θ i ) 2 .
4. Step 3: Computation of V 1 = ( α 1 , β 1 , k 1 , θ 1 )
V 1 = V 0 + λ 0 d 0 , such that d 0 = l ( α 0 , β 0 , k 0 , θ 0 ) .
Computation of a step λ 0 : We can find λ 0 with exact and inaccurate linear search methods. In this case, the use of the exact linear search helps us to have a fast convergence. The exact linear search method is to solve the problem
min λ 0 R + l ( V 0 + λ 0 d 0 ) .
To solve this problem, we must find the value of λ 0 such that l ( V 0 + λ 0 d 0 ) = 0 :
α 0 λ 0 × m Ψ ( α 0 + β 0 ) + Ψ ( α 0 ) j = 1 m δ j ln 1 t j θ 0 k 0 j = 1 m ( 1 δ j ) z α ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) = 0 , β 0 λ 0 m Ψ ( β 0 ) + Ψ ( β 0 + α 0 ) k 0 j = 1 m ln t j θ 0 j = 1 m ( 1 δ j ) z β ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) = 0 , k 0 λ 0 × k 0 1 j = 1 m δ j + j = 1 m β 0 δ j ln t j θ 0 1 t j θ 0 k 0 1 ln t j θ 0 j = 1 m ( 1 δ j ) z k ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) = 0 .
Consequently
α 0 + β 0 + k 0 + λ 0 × m Ψ ( α 0 + β 0 ) + Ψ ( α 0 ) j = 1 m δ j ln 1 t j θ 0 k 0 j = 1 m ( 1 δ j ) z α ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) + m Ψ ( β 0 ) + Ψ ( β 0 + α 0 ) k 0 j = 1 m ln t j θ 0 j = 1 m ( 1 δ j ) z β ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) + k 0 1 j = 1 m δ j + j = 1 m β 0 δ j ln t j θ 0 1 t j θ 0 k 0 1 ln t j θ 0 j = 1 m ( 1 δ j ) z k ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) = 0
and we obtain
λ 0 = ( α 0 + β 0 + k 0 ) / m [ Ψ ( α 0 + β 0 ) + Ψ ( α 0 ) ] j = 1 m δ j ln 1 t j θ 0 k 0 j = 1 m ( 1 δ j ) z α ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) m Ψ ( β 0 ) + Ψ ( β 0 + α 0 ) k 0 j = 1 m ln t j θ 0 j = 1 m ( 1 δ j ) z β ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) k 0 1 j = 1 m δ j + j = 1 m β 0 δ j ln t j θ 0 1 t j θ 0 k 0 1 ln t j θ 0 j = 1 m ( 1 δ j ) z k ( t j , α 0 , β 0 , k 0 , θ 0 ) z ( t j , α 0 , β 0 , k 0 , θ 0 ) .

5.2. Numerical Results for BP Parameter Estimation with Censored Data

To show the efficiency of the MLEs obtained by the gradient and conjugate gradient method when data are right censored, an important simulation study was realized. The results of the MLEs and their SDs are summarized in the Table 11 and Table 12.
To evaluate and compare the performance of the estimators obtained by the proposed methods, we present in the Table 13 and Table 14 the bias and MSEs of the estimators.

5.3. Model Selection for Censored Data

In this section, the censored data are used with the AIC, BIC and AICc criteria to choose the best statistical model for our statistical population. These results are presented in Table 15, Table 16, Table 17 and Table 18. As for the uncensored data, we can conclude that the information criteria choose the correct model even for small or medium values of the sample size n . Note that similar remarks as the ones done in Section 4 for model selection for complete (uncensored) data hold true also here.

6. Conclusions

In this paper, we have developed different optimization methods to determine the maximum likelihood estimators of the four parameters of the Beta-Pareto distribution. The results obtained showed that all the used methods give n -consistent estimators in both complete and right-censored data samples and particularly the conjugate gradient method gives the best results. Using classical model selection criteria, our study proves that this new model can be used instead of several alternative models such as Beta, Pareto, Gamma, and Generalized Beta-Pareto. Another important contribution of our work is the use of right-censored data as an input for the estimation procedure and model selection techniques.

Author Contributions

All authors have equally contributed to different stages of the article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Akinsete, A.; Famoye, F.; Lee, C. The Beta-Pareto distribution. Statistics 2008, 42, 547–563. [Google Scholar] [CrossRef]
  2. Asmussen, S. Applied Probability and Queues; Springer: New York, NY, USA, 2003. [Google Scholar]
  3. Zhang, J. Likelihood moment estimation for the generalized Pareto distribution. Aust. N. Z. J. Stat. 2007, 49, 69–77. [Google Scholar] [CrossRef]
  4. Levy, M.; Levy, H. Investment talent and the Pareto wealth distribution: Theoretical and experimental analysis. Rev. Econ. Stat. 2003, 85, 709–725. [Google Scholar] [CrossRef]
  5. Burroughs, S.M.; Tebbens, S.F. Upper-truncated power law distributions. Fractals 2001, 9, 209–222. [Google Scholar] [CrossRef]
  6. Newman, M.E. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 2005, 46, 323–351. [Google Scholar] [CrossRef] [Green Version]
  7. Choulakian, V.; Stephens, M.A. Goodness-of-fit tests for the generalized Pareto distribution. Technometrics 2001, 43, 478–484. [Google Scholar] [CrossRef]
  8. Conn, A.R.; Gould, N.I.M.; Toint, P.L. Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math. Program. 1991, 50, 177–195. [Google Scholar] [CrossRef]
  9. Hestenes, M.R.; Stiefel, E. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
  10. Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Computer J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
  11. Polak, E.; Ribiere, G. Note sur la convergence de méthodes de directions conjuguées. ESAIM Math. Model. Numer. Anal. 1969, 3, 35–43. (In French) [Google Scholar] [CrossRef]
  12. Fletcher, R. Practical Methods of Optimization, 2nd ed.; Wiley: Hoboken, NJ, USA, 2000. [Google Scholar]
  13. Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 1966, 16, 1–3. [Google Scholar] [CrossRef] [Green Version]
  14. Goldstein, A.A. On steepest descent. J. Soc. Ind. Appl. Math. Ser. A Control 1965, 3, 147–151. [Google Scholar] [CrossRef]
  15. Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. 1969, 11, 226–235. [Google Scholar] [CrossRef]
  16. Fletcher, R.; Powell, M.J.D. A rapidly convergent descent method for minimization. Comp. J. 1963, 6, 163–168. [Google Scholar] [CrossRef]
  17. Sato, H.; Iwai, T. A new, globally convergent Riemannian conjugate gradient method. Optimization 2013, 64, 1011–1031. [Google Scholar] [CrossRef] [Green Version]
  18. Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
  19. Nassar, M.M.; Nada, N.K. The beta generalized Pareto distribution. J. Stat. Adv. Theory Appl. 2011, 6, 1–17. [Google Scholar]
  20. Mahmoudi, E. The beta generalized Pareto distribution with application to lifetime data. Math. Comp. Simul. 2011, 81, 2414–2430. [Google Scholar] [CrossRef]
Figure 1. Density of the BP distribution.
Figure 1. Density of the BP distribution.
Mathematics 08 01055 g001
Figure 2. Survival function of the BP distribution.
Figure 2. Survival function of the BP distribution.
Mathematics 08 01055 g002
Figure 3. Cumulative distribution function of the BP distribution.
Figure 3. Cumulative distribution function of the BP distribution.
Mathematics 08 01055 g003
Figure 4. Hazard rate of the BP distribution.
Figure 4. Hazard rate of the BP distribution.
Mathematics 08 01055 g004
Figure 5. Cumulative hazard rate of the BP distribution.
Figure 5. Cumulative hazard rate of the BP distribution.
Mathematics 08 01055 g005
Figure 6. The MSE of α ^ computed with Newton’s, gradient and conjugate gradient methods.
Figure 6. The MSE of α ^ computed with Newton’s, gradient and conjugate gradient methods.
Mathematics 08 01055 g006
Figure 7. The MSE of β ^ computed with Newton’s, gradient and conjugate gradient methods.
Figure 7. The MSE of β ^ computed with Newton’s, gradient and conjugate gradient methods.
Mathematics 08 01055 g007
Figure 8. The MSE of k ^ computed with Newton’s, gradient and conjugate gradient methods.
Figure 8. The MSE of k ^ computed with Newton’s, gradient and conjugate gradient methods.
Mathematics 08 01055 g008
Table 1. Results of simulation with conjugate gradient method for m = 10,000 iterations, with α = 23 , β = 34 , k = 19 , θ = 0.15 .
Table 1. Results of simulation with conjugate gradient method for m = 10,000 iterations, with α = 23 , β = 34 , k = 19 , θ = 0.15 .
n α ^ β ^ k ^ SD ( α ^ ) SD ( β ^ ) SD ( k ^ )
1022.787333.786519.98001.06 × 10 4 1.06 × 10 4 9.44 × 10 6
1522.791033.790219.98021.04 × 10 4 1.04 × 10 4 9.47 × 10 6
2022.794733.793919.98041.02 × 10 4 1.04 × 10 4 1.11 × 10 5
3022.802033.801219.98089.89 × 10 5 9.97 × 10 5 1.01 × 10 5
4022.809433.808619.98129.52 × 10 5 9.62 × 10 5 1.02 × 10 5
5022.816833.816019.99069.15 × 10 5 9.17 × 10 5 4.26 × 10 6
10022.853633.852819.99267.31 × 10 5 7.35 × 10 5 3.72 × 10 6
15022.890433.889619.99465.47 × 10 5 5.52 × 10 5 2.83 × 10 6
20022.927333.926519.99663.63 × 10 5 3.65 × 10 5 1.45 × 10 6
30023.000934.000120.00064.78 × 10 7 4.01 × 10 7 8.09 × 10 7
Table 2. Results of simulation with gradient method for m = 10,000 iterations, with α = 23 , β = 34 , k = 19 , θ = 0.15 .
Table 2. Results of simulation with gradient method for m = 10,000 iterations, with α = 23 , β = 34 , k = 19 , θ = 0.15 .
n α ^ β ^ k ^ SD ( α ^ ) SD ( β ^ ) SD ( k ^ )
1022.787333.786519.77902.12 × 10 5 2.12 × 10 5 2.20 × 10 5
1522.791033.790219.79922.08 × 10 5 2.09 × 10 5 1.99 × 10 5
2022.794733.793919.79942.05 × 10 5 2.06 × 10 5 2.00 × 10 5
3022.802033.801219.79981.97 × 10 5 1.97 × 10 5 1.99 × 10 5
4022.809433.808619.80021.90 × 10 5 1.90 × 10 5 1.98 × 10 5
5022.816833.816019.81261.83 × 10 5 1.83 × 10 5 1.86 × 10 5
10022.853633.852819.90261.46 × 10 5 1.47 × 10 5 9.76 × 10 6
15022.890433.889619.90461.09 × 10 5 1.11 × 10 5 9.69 × 10 6
20022.927333.926519.90667.26 × 10 6 7.32 × 10 6 9.30 × 10 6
30023.000934.000119.91069.56 × 10 8 8.21 × 10 8 8.83 × 10 6
Table 3. Results of simulation with Newton’s method for m = 10,000 iterations, with α = 23 , β = 34 , k = 19 , θ = 0.15 .
Table 3. Results of simulation with Newton’s method for m = 10,000 iterations, with α = 23 , β = 34 , k = 19 , θ = 0.15 .
n α ^ β ^ k ^ SD ( α ^ ) SD ( β ^ ) SD ( k ^ )
1022.550833.420719.66112.01 × 10 2 3.35 × 10 2 1.14 × 10 2
1522.550833.420719.66111.34 × 10 2 2.23 × 10 2 7.65 × 10 3
2022.550833.42019.66841.01 × 10 2 1.67 × 10 2 5.49 × 10 3
3022.770733.850719.81986.72 × 10 3 7.42 × 10 4 1.08 × 10 3
4022.770733.850719.82435.04 × 10 3 5.56 × 10 4 7.71 × 10 4
5022.770733.850719.82394.03 × 10 3 4.45 × 10 4 1.03 × 10 4
10022.770733.850719.81982.01 × 10 3 2.22 × 10 4 1.03 × 10 4
15022.770733.850719.81971.34 × 10 3 1.48 × 10 4 2.16 × 10 4
20022.770733.850719.81972.62 × 10 4 1.11 × 10 4 1.62 × 10 4
30022.770733.850719.81971.75 × 10 4 7.42 × 10 5 1.08 × 10 4
Table 4. MLEs of α with the conjugate gradient method and gradient method.
Table 4. MLEs of α with the conjugate gradient method and gradient method.
α
α ^ (bias)MSE
n α ^ GC (bias) α ^ G (bias) MSE GC MSE G
1022.787 (0.220)23.009786028 (−0.0097860)1.36 × 10 9 3.58 × 10 7
2022.794 (0.205)23.009786227 (−0.0097862)5.44 × 10 9 3.58 × 10 9
3022.802 (0.197)23.009786057 (−0.0097860)1.22 × 10 8 3.58 × 10 7
4022.802 (0.197)23.009786022 (−0.0097860)1.22 × 10 8 3.58 × 10 7
5022.809 (0.190)23.009786225 (−0.0097862)9.89 × 10 11 3.58 × 10 7
10022.809 (3.4 × 10 8 )23.009786237 (−0.009786)9.89 × 10 11 3.58 × 10 7
15022.816 (0.109)23.009786240 (−0.009786)3.06 × 10 7 3.58 × 10 7
20022.890 (0.072)23.009786240 (−0.009786)5.44 × 10 7 3.58 × 10 7
30023.001 (−0.001)23.009786241 (−0.009786)1.22 × 10 6 3.58 × 10 7
100023.002 (−0.002)23.009786241 (−0.009786)1.37 × 10 5 3.58 × 10 7
Table 5. MLEs of β with the conjugate gradient method and gradient method.
Table 5. MLEs of β with the conjugate gradient method and gradient method.
β
β ^ (bias)MSE
n β ^ GC (bias) β ^ G (bias) MSE GC MSE G
1033.786 (0.006)33.9995 (0.000409)1.27 × 10 9 3.57 × 10 7
2033.793 (0.206)33.9997 (0.000225)5.04 × 10 9 3.58 × 10 9
3033.801 (0.198)33.9996 (0.000348)8.76 × 10 9 3.57 × 10 7
4033.801 (0.198)33.9996 (0.000364)1.22 × 10 8 3.57 × 10 7
5033.808 (0.191)33.9997 (0.000229)2.13 × 10 8 3.58 × 10 7
10033.816 (0.183)33.9997 (0.00022)1.65 × 10 10 3.58 × 10 7
15033.889 (0.110)33.9997 (0.000217)3.06 × 10 7 3.58 × 10 7
20033.926 (0.073)33.9997 (0.000218)5.41 × 10 7 3.58 × 10 7
30034.000 (−0.0006)33.9997 (0.000217)1.21 × 10 6 3.58 × 10 7
100034.001 (0.001)33.9997 (0.000217)1.37 × 10 5 3.58 × 10 7
Table 6. MLEs of k with the conjugate gradient method and gradient method.
Table 6. MLEs of k with the conjugate gradient method and gradient method.
k
k ^ (bias)MSE
n k ^ GC (bias) k ^ G (bias) MSE GC MSE G
1019.989 (0.0001)20.046 (−0.046)1.98 × 10 11 4.40 × 10 8
2019.989 (0.0105)20.037 (−0.037)1.48 × 10 11 3.50 × 10 9
3019.989 (0.0101)20.042 (−0.042)1.74 × 10 10 3.82 × 10 7
4019.989 (0.0102)20.041 (−0.041)1.43 × 10 10 3.77 × 10 7
5019.990 ( 0.0096)20.037 (0.037)1.00 × 10 10 3.52 × 10 7
10019.990 (0.0094)20.037 (0.037)9.89 × 10 11 3.53 × 10 7
15019.994 (0.0054)20.037 (0.037)1.65 × 10 10 3.53 × 10 7
20019.996 (0.0032)20.037 (0.037)1.16 × 10 9 3.53 × 10 7
30023.001 (−0.0014)20.037 (−0.037)1.67 × 10 9 3.58 × 10 7
100020.000 (−0.0006)20.037 (−0.037)3.10 × 10 9 3.53 × 10 7
Table 7. Smaller AIC/BIC/AICc scores in 1000 simulations from a GBP distribution G B P ( α , β , k , θ , c ) with α = 2.7 , β = 1.2 , k = 5.5 , θ = 0.15 , c = 0.825 .
Table 7. Smaller AIC/BIC/AICc scores in 1000 simulations from a GBP distribution G B P ( α , β , k , θ , c ) with α = 2.7 , β = 1.2 , k = 5.5 , θ = 0.15 , c = 0.825 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
102227122119262297191317222277535312
20156784131730179710713911767842330
3014556710152631556034112271517841757
50962788126061063073757711160405244
1004348009094360008974323000727
200380009623800096238000959
300200009802000098020000980
500500099550009955000995
1000000010000000100000001000
Table 8. Smaller AIC/BIC/AICc scores in 1000 simulations from a Pareto distribution P ( k , θ ) with k = 0.075 , θ = 0.15 .
Table 8. Smaller AIC/BIC/AICc scores in 1000 simulations from a Pareto distribution P ( k , θ ) with k = 0.075 , θ = 0.15 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
7026778331812111915548231096421
80123786019919290916913497616
904078002167091208130997010
1000097003000995050099208
1500097402400996040099703
2000098002000996040099802
3000098401600997030099802
500009920800995050099802
1000001000000010000000100000
Table 9. Smaller AIC/BIC/AICc scores in 1000 simulations from a Gamma distribution G ( α , β ) with α = 0.5 , β = 0.9 .
Table 9. Smaller AIC/BIC/AICc scores in 1000 simulations from a Gamma distribution G ( α , β ) with α = 0.5 , β = 0.9 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
10092988109298810929890
20313994728421129723276600385
30212589385413995717165793195
5011158231601849622504295044
1000937792090519557201098019
1500008071930009148600097723
20000077622400089810200098812
500000843157000906940009982
100000091585000949510009982
200000097525000982180009982
Table 10. Smaller AIC/BIC/AICc scores in 1000 simulations from a BP distribution B P ( α , β , k , θ ) with α = 0.5 , β = 0.9 , k = 0.8 , θ = 0.15 .
Table 10. Smaller AIC/BIC/AICc scores in 1000 simulations from a BP distribution B P ( α , β , k , θ ) with α = 0.5 , β = 0.9 , k = 0.8 , θ = 0.15 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
10560190421055820042205252904460
20648103510645103540620303770
3048103510645103540620303770
50648103510645103540620303770
100873001270871001290820001800
150890001100888001120844001560
2009310069093100690892001080
500962003809620038092800720
1000999001099900109980020
2000100000001000000010000000
Table 11. Results of simulation with samples of censored data using the conjugate gradient method with m = 10,000 , α = 13 , β = 23 , k = 18 , θ = 0.15 .
Table 11. Results of simulation with samples of censored data using the conjugate gradient method with m = 10,000 , α = 13 , β = 23 , k = 18 , θ = 0.15 .
n α ^ β ^ k ^ SD ( α ^ ) SD ( β ^ ) SD ( k ^ )
1012.011123.508717.78911.7014440.7668390.036204
1512.054123.858017.78910.7339330.6596890.036204
2012.096723.945417.78910.1834830.4762730.036204
3012.010623.468218.09070.1174290.0673670.036204
4012.267623.579318.09070.0073390.1373930.036204
5012.353323.148718.09078.39 × 10 5 8.67 × 10 5 4.71 × 10 6
10012.781723.026618.09078.47 × 10 5 8.83 × 10 5 4.71 × 10 6
15012.829322.963218.09078.55 × 10 5 9.50 × 10 5 4.71 × 10 6
20012.894022.841617.86264.78 × 10 7 4.01 × 10 7 8.09 × 10 7
30012.977722.784717.82771.58 × 10 8 4.66 × 10 8 3.33 × 10 10
50012.978522.817517.86262.85 × 10 8 9.44 × 10 9 3.35 × 10 10
100013.000223.000417.00737.00 × 10 9 1.19 × 10 10 5.69 × 10 9
Table 12. Results of simulation with samples of censored data using the gradient method with m = 10,000, α = 13 , β = 23 , k = 18 , θ = 0.15 .
Table 12. Results of simulation with samples of censored data using the gradient method with m = 10,000, α = 13 , β = 23 , k = 18 , θ = 0.15 .
n α ^ β ^ k ^ SD ( α ^ ) SD ( β ^ ) SD ( k ^ )
1012.868522.947315.98860.0003190.0005270.001948
1512.916822.944416.81067.03 × 10 6 9.83 × 10 5 2.82 × 10 2
2012.933622.966916.20102.11 × 10 5 1.00 × 10 4 9.67 × 10 3
3012.933623.011916.89352.11 × 10 5 5.79 × 10 6 3.35 × 10 2
4012.933623.009516.86262.12 × 10 5 3.88 × 10 6 3.15 × 10 2
5012.942023.025617.43032.82 × 10 5 1.80 × 10 5 5.93 × 10 2
10012.950422.938617.42013.52 × 10 5 7.21 × 10 5 4.68 × 10 2
15012.958822.932717.74394.22 × 10 5 7.24 × 10 5 5.72 × 10 2
20012.967222.929018.04244.93 × 10 5 6.92 × 10 5 6.61 × 10 2
30012.975622.940418.35355.63 × 10 5 4.24 × 10 5 7.58 × 10 2
50012.975622.938119.70805.63 × 10 5 4.08 × 10 5 8.82 × 10 2
100012.984023.003019.79336.35 × 10 5 2.29 × 10 7 1.05 × 10 1
Table 13. Gradient method.
Table 13. Gradient method.
α ^ β ^ k ^
E( V ^ )12.945022.965617.6040
bias−0.0242−0.03490.3027
MSE5.66 × 10 5 1.41 × 10 5 7.27 × 10 2
Table 14. Conjugate gradient method.
Table 14. Conjugate gradient method.
α ^ β ^ k ^
E( V ^ )13.002923.00245217.9714
bias−0.07900.02610.0328
MSE1.82 × 10 7 3.90 × 10 9 3.42 × 10 10
Table 15. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored Pareto distribution P ( k , θ ) with k = 7.5 , θ = 0.15 .
Table 15. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored Pareto distribution P ( k , θ ) with k = 7.5 , θ = 0.15 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
10710233745569233748410423373815
2051241756333641757047641755911
30175614292565614253265614283
50057342610017342650027342640
100029128600091288001912870
200009396100093961000939610
500009732700097327000973270
1000009964000996400099640
Table 16. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored Gamma distribution G ( α , β ) with α = 2 , β = 1 .
Table 16. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored Gamma distribution G ( α , β ) with α = 2 , β = 1 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
10015294700152947001529470
20006493600064936000649360
30015294700152947001529470
50006393700063937000639370
100015694300156943001569430
200025294600252946002529460
500015394600053947001539460
1000016193800061939001619380
Table 17. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored BP distribution B P ( α , β , k , θ ) with α = 2.7 , β = 1.9 , k = 0.48 ; θ = 0.15 .
Table 17. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored BP distribution B P ( α , β , k , θ ) with α = 2.7 , β = 1.9 , k = 0.48 ; θ = 0.15 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
10453112550294347245241142112225797
206008153707532615438953115104395
306367934445985838276111263692
50704362861665273233688723021
100711002890680003200699003010
200714002860682003180708002980
500741002590736002640741002590
1000752002480748002520752002480
2000755002450753002470755002450
Table 18. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored GBP distribution G B P ( α , β , k , θ , c ) with α = 1.5 , β = 0.25 , k = 0.075 , θ = 0.15 , c = 0.001125 .
Table 18. Smaller AIC/BIC/AICc scores in 1000 simulations from a censored GBP distribution G B P ( α , β , k , θ , c ) with α = 1.5 , β = 0.25 , k = 0.075 , θ = 0.15 , c = 0.001125 .
AICBICAICc
nBPBPGGBPBPBPGGBPBPBPGGBP
10501099450109945010994
20407098930709905060989
3011010988901099011010988
50190109801901098019010980
100240109752401097524010975
200330209653302096534020964
300530409435104094552040944
500760109237501092476010923
1000122020876118020880123000877
2000238020760237020761237020761

Share and Cite

MDPI and ACS Style

Boumaraf, B.; Seddik-Ameur, N.; Barbu, V.S. Estimation of Beta-Pareto Distribution Based on Several Optimization Methods. Mathematics 2020, 8, 1055. https://doi.org/10.3390/math8071055

AMA Style

Boumaraf B, Seddik-Ameur N, Barbu VS. Estimation of Beta-Pareto Distribution Based on Several Optimization Methods. Mathematics. 2020; 8(7):1055. https://doi.org/10.3390/math8071055

Chicago/Turabian Style

Boumaraf, Badreddine, Nacira Seddik-Ameur, and Vlad Stefan Barbu. 2020. "Estimation of Beta-Pareto Distribution Based on Several Optimization Methods" Mathematics 8, no. 7: 1055. https://doi.org/10.3390/math8071055

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop