Estimation of Beta-Pareto Distribution Based on Several Optimization Methods

Boumaraf, Badreddine; Seddik-Ameur, Nacira; Barbu, Vlad Stefan

doi:10.3390/math8071055

Open AccessArticle

Estimation of Beta-Pareto Distribution Based on Several Optimization Methods

by

Badreddine Boumaraf

^1,2,3,†

,

Nacira Seddik-Ameur

^2,† and

Vlad Stefan Barbu

^3,*,†

¹

Department of Mathematics and Informatics, University of Souk-Ahras, Souk Ahras 41000, Algeria

²

Laboratory of Probability and Statistics LaPS, University of Badji Mokhtar of Annaba, Annaba 23000, Algeria

³

Laboratory of Mathematics Raphaël Salem, University of Rouen-Normandy, 76801 Saint Étienne du Rouvray, France

^*

Author to whom correspondence should be addressed.

^†

The authors contributed equally to this work.

Mathematics 2020, 8(7), 1055; https://doi.org/10.3390/math8071055

Submission received: 28 February 2020 / Revised: 3 June 2020 / Accepted: 10 June 2020 / Published: 1 July 2020

(This article belongs to the Special Issue Probability, Statistics and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This paper is concerned with the maximum likelihood estimators of the Beta-Pareto distribution introduced in Akinsete et al. (2008), which comes from the mixing of two probability distributions, Beta and Pareto. Since these estimators cannot be obtained explicitly, we use nonlinear optimization methods that numerically provide these estimators. The methods we investigate are the method of Newton-Raphson, the gradient method and the conjugate gradient method. Note that for the conjugate gradient method we use the model of Fletcher-Reeves. The corresponding algorithms are developed and the performances of the methods used are confirmed by an important simulation study. In order to compare between several concurrent models, namely generalized Beta-Pareto, Beta, Pareto, Gamma and Beta-Pareto, model criteria selection are used. We firstly consider completely observed data and, secondly, the observations are assumed to be right censored and we derive the same type of results.

Keywords:

maximum likelihood estimators; nonlinear optimization methods; Beta-Pareto distribution; Beta distribution; Pareto distribution; model selection; right-censored data

1. Introduction

In this work we are interested in the four-parameter distribution called Beta-Pareto (BP) distribution, introduced recently by Akinsete et al. (2008) [1]. This distribution is a generalization of several models like Pareto, logbeta, exponential, arcsine distributions, in the sense that these distributions can be considered as special cases of the BP distribution, by making transformation of the variable or by setting special values of the parameters (cf. Reference [1]).

It is well known that the family of Pareto distribution and the corresponding generalizations have been extensively used in various fields of applications, like income data [2], environmental studies [3] or economical-social studies [4]. Concerning the generalizations of the Pareto distribution, we can cite the Burr distribution [5], the power function [6] and the logistic distribution. It is important also to stress that heavy tailed phenomena can be successfully modelled by means of (generalized) Pareto distributions [7]. Note that this BP model is based on Pareto distribution which is known to be heavy tailed, as it was shown in Reference [1] using some theoretical and application arguments. We would like to stress that this is important in practice because it can be used to describe skewed data better than other distributions proposed in the statistical literature. A classical example of this kind is given by the exceedances of Wheaton River flood data which are highly skewed to the right.

All these reasons show the great importance of statistically investigating any generalization of Pareto distributions. Thus the purpose of our work is to provide algorithmic methods for practically computing the maximum likelihood estimators (MLEs) of the parameters of the BP distribution, to carry out intensive simulation studies in order to investigate the quality of the proposed estimators, to consider model selection criteria in order to choose among candidate models and also to take into account right-censored data, not only completely observed data. Note that right-censored data are often observed in reliability studies and survival analysis, where experiments are generally conducted over a fixed time period. At the end of the study, generally we cannot observe the failure times of all the products or death (remission) of all patients due to loss to follow-up, competing risks (death from other causes) or any combination of these. In such cases, we do not always have the opportunity of observing these survival times, observing only the censoring times. Consequently, it is of crucial importance for this type of data and applications to develop a methodology that allows the estimation in the presence of right censoring.

Generally, numerical methods are required when the MLEs of the unknown parameters of any model cannot be obtained. For the BP distribution, we propose the use of several nonlinear optimization methods: Newton-Raphson’ method, the gradient and the conjugate gradient methods. For the Newton-Raphson’s method, the evaluation of the approximation of the Hessian matrix is widely studied in the literature. In this work, we are interested in the BFGS method (from Broyden-Fletcher-Goldfarb-Shanno), the DFP method (from Davidon-Fletcher-Powell) and the SR1 method (Symmetric Rank 1). For the conjugate gradient method we use the model of Fletcher-Reeves.

It is well known that the gradient and conjugate gradient methods are better than the Newton-Raphson’s method. Nonetheless, most statistical studies use the Newton-Raphson’s method instead of the gradient or conjugate gradient methods, maybe because it is much more easier to put in practice. Our interest is to put in practice the gradient method and the conjugate gradient method in our framework of BP estimation and also present the Newton-Raphson method for comparison purposes only.

The structure of the article is as follows. In the next section we introduce the BP distribution and we give some elements of MLE for such a model. Section 3 is devoted to the numerical optimization methods used for obtaining the MLEs for the parameters of interest. Firstly, we briefly recall the numerical methods that we use (Newton-Raphson, gradient and conjugate gradient). Secondly, we present the corresponding algorithm for the method of conjugate gradient, which is the most complex one. We end this section by investigating through simulations the accuracy of these three methods. In Section 4 we use model selection criteria (AIC, BIC, AICc) in order to chose between several concurrent models, namely between the Generalized Beta-Pareto, Beta, Pareto, Gamma, BP models respectively. In Section 5 we assume that we have at our disposal right-censored observations and we derive the same type of results as we did for complete observations.

2. BP Distribution and MLEs of the Parameters

A random variable X has a BP distribution with parameters

α,

β

,

θ,

k > 0,

if its probability density function is

f (x; α, β, k, θ) : = \frac{k}{θ B (α, β)} {[1 - {(\frac{x}{θ})}^{- k}]}^{α - 1} {(\frac{x}{θ})}^{- k β - 1}, x \geq θ,

with

B (α, β) = \frac{Γ (α) Γ (β)}{Γ (α + β)},

Γ (x), x > 0,

being the Gamma function. Consequently, the support of X is

[θ, + \infty) .

The corresponding cumulative distribution function can be written as

F (x; α, β, k, θ) = 1 - \frac{{(\frac{x}{θ})}^{- k β}}{β B (α, β)} - \frac{{(\frac{x}{θ})}^{- k β}}{B (α, β)} \sum_{n = 1}^{\infty} \frac{\prod_{i = 1}^{n} (i - α)}{n! (β + n)} {(\frac{x}{θ})}^{- k n}, x \geq θ, α, β, θ, k > 0 .

In Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5, the pdf, survival, cdf, hazard and cumulative hazard functions are presented for several values of the parameters.

Let us now consider an i.i.d. (independent and identically distributed) sample

(x_{1}, x_{2}, x_{3}, \dots, x_{n})

of a random variable X following a BP distribution with density

f (x; α, β, k, θ) .

The corresponding log-likelihood function can be written as

\begin{matrix} l (α, β, k, θ ∣ x_{1}, x_{2}, x_{3}, \dots, x_{n}) = n ln k - n ln θ + n (ln Γ (α + β) - ln Γ (α) - ln Γ (β)) \\ + (α - 1) \sum_{i = 1}^{n} ln [1 - {(\frac{x_{i}}{θ})}^{- k}] - (k β + 1) \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ}) . \end{matrix}

(1)

As usual, when we need to see the log-likelihood function as a function of the parameters

α,

β,

k,

θ

only, we shall write

l (α, β, k, θ)

instead of

l (α, β, k, θ ∣ x_{1}, x_{2}, \dots, x_{n}) .

The score equations are thus given by

\frac{\partial l (α, β, k, θ)}{\partial α} = n [Ψ (α + β) - Ψ (α)] + \sum_{i = 1}^{n} ln (1 - {(\frac{x_{i}}{θ})}^{- k}) = 0,

(2)

\frac{\partial l (α, β, k, θ)}{\partial β} = n [Ψ (α + β) - Ψ (β)] - k \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ}) = 0,

(3)

\frac{\partial l (α, β, k, θ)}{\partial k} = \frac{n}{k} - \sum_{i = 1}^{n} [β + (α - 1) {(1 - {(\frac{x_{i}}{θ})}^{k})}^{- 1}] ln (\frac{x_{i}}{θ}) = 0,

(4)

where the function

Ψ

is defined by

Ψ (x) : = \frac{Γ^{'} (x)}{Γ (x)}, x > 0 .

As

x \geq θ

, the MLE of the parameter

θ

is the first order statistic

x_{(1)}

and the MLEs of

α,

β

and k are obtained by solving the system (2)–(4).

Using Equations (2)–(4) we compute the elements of the Fisher information matrix as follows [1]:

\begin{matrix} \frac{\partial^{2} l (α, β, k, θ)}{\partial α^{2}} = n [Ψ^{'} (α + β) - Ψ^{'} (α)], \\ \frac{\partial^{2} l (α, β, k, θ)}{\partial α \partial β} = n Ψ^{'} (α + β), \\ \frac{\partial^{2} l (α, β, k, θ)}{\partial k \partial β} = - \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ}), \\ \frac{\partial^{2} (l (α, β, k, θ))}{\partial k \partial α} = \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ}) {[{(\frac{x_{i}}{θ})}^{k} - 1]}^{- 1}, \\ \frac{\partial^{2} (l (α, β, k, θ))}{\partial k^{2}} = - \frac{n}{k^{2}} - (α - 1) \sum_{i = 1}^{n} {(\frac{x_{i}}{θ})}^{k} {[\frac{ln (\frac{x_{i}}{θ})}{1 - {(\frac{x_{i}}{θ})}^{k}}]}^{2}, \\ \frac{\partial^{2} (l (α, β, k, θ))}{\partial β^{2}} = n [Ψ^{'} (α + β) - Ψ^{'} (β)] . \end{matrix}

There are no closed formulas for solving these equations, so numerical methods are required.

3. Numerical Optimization for Solving the Score Equations

In this section we are interested in proposing numerical methods for solving the score equations of the MLEs of the BP parameters. We will consider three methods: the Newton-Raphson’s method, the gradient method and the conjugate gradient method. Firstly, we will recall the general lines of these optimization methods. Secondly, we will give a detailed description of the application of the most complex method, namely of the conjugate gradient one, for computing the MLEs of the BP distribution. We close the section by presenting numerical results of the implementation of these methods in our BP framework.

3.1. General Description of Optimization Methods

Les us first recall the main lines of the three numerical methods that we have considered. The objectives of these methods will be to minimize/maximize a function

l : R^{p} \to R;

in our case,

p = 3

(the three parameters of the BP distribution to be estimated,

α,

β,

k). As it was mentioned in the previous section, although the number of parameters of the BP distribution is 4 (the parameters are

α,

β,

k and

θ

), the MLE of the forth parameter

θ

can be immediately obtained as the first order statistics,

\hat{θ} = x_{(1)} .

For this reason, although the function l has 4 arguments, in the optimization methods only 3 will be in fact involved.

3.1.1. Newton-Raphson’s Method

Newton’s method: Let us denote by

V : = (α, β, k, θ)

a current point of the function

l,

let

\nabla l (V_{k})

be the gradient and

H_{K}

be the Hessian function at an iteration

V_{k}

of the current point

V .

Newton’s method consists in taking the descent direction

d_{k} = - H_{k}^{- 1} \nabla l (V_{k}) .

Near-Newton method: Often in practice, the inverse of the Hessian,

H_{K}^{- 1},

is very difficult to evaluate when the function l is not analytic. The gradient is always more or less accessible (by inverse methods). As the Hessian cannot be computed exactly, we try to evaluate an approximation. Among the methods that approximate the Hessian, three are retained here: the method

B F G S

(for Broyden-Fletcher-Goldfarb-Shanno), the method

D F P

(for Davidon-Fletcher-Powell) and the method

S R 1

(for Symmetric Rank 1 method) [8].

We introduce the notation

s_{k} = V_{k + 1} - V_{k}

and

y_{k} = \nabla l (V_{k + 1}) - \nabla l (V_{k})

and we choose

H_{0}

a definite positive matrix; for convexity reasons the identity matrix is usually chosen.

Update of the Hessian by the method BFGS: In this approach, the approximation of the Hessian is given by

H_{k + 1} = H_{k} + \frac{y_{k} y_{k}^{T}}{y_{k}^{T} s_{k}} - \frac{H_{k} s_{k} s_{k}^{T} H_{k}}{s_{k}^{T} H_{k} s_{k}} .

3.1.2. Gradient Method

This algorithm is based on the fact that, in the vicinity of a point V, the function l decreases most strongly in the opposed direction of the gradient of

l,

d = - \nabla l (V) .

Let us fix an arbitrary

ϵ > 0 .

The algorithm can be described as follows:

Step 0 (Initialization): $V_{0} = (α_{0}, β_{0}, k_{0}, θ_{0})$ that satisfies the conditions for the parameters of the BP distribution; set $k = 0$ and go to Step 1
Step 1: Computation of $d_{k} = - \nabla l (V_{k})$ ; if $‖ \nabla l (V_{k}) ‖ \leq ϵ,$ then STOP; If not, go to Step 2.
Step 2: Compute $λ_{k}$ the solution of the problem

$min_{λ \in R^{+}} l (V_{k} + λ d_{k})$

and $V_{k + 1},$

$V_{k + 1} = V_{k} - λ_{k} \nabla l (V_{k}) .$

Set $k = k + 1$ and go to Step 1.

The algorithm that uses this direction of descent is called the gradient algorithm or the deepest descent algorithm. The algorithm generally requires a finite number of iterations, depending on the regularity of the function

l .

In practice, we often observe that

\nabla l (V)

is a good direction of descent, but the convergence to the solution is generally slow. Despite its poor numerical performance, this algorithm is worth studying.

3.1.3. Conjugate Gradient Method

The conjugate gradient method is one of the most famous and widely used methods for solving minimization problems. It is mostly used for big size problems. This method was proposed in 1952 by Hestenes and Steifel for the minimization of strictly convex quadratic functions [9].

Many mathematicians have used this method for the nonlinear (non-quadratic) case. This was done for the first time in 1964 by Fletcher and Reeves [10], then in 1969 by Polak and Ribiere [11]; another variant was studied in 1987 by Fletcher [12]. The strategy adopted was to use a recursive sequence

V_{k + 1} = V_{k} + λ_{k} d_{k},

where

λ_{k}

is a properly chosen positive real constant called “step” and

d_{k}

is a non-zero real vector called “direction”. As the conjugate gradient algorithm is used to solve nonlinear functions, we should note that nonlinear conjugate gradient methods are not unique.

Algorithm for the conjugate gradient method:

The Algorithm is initialized by the step 0 of the simple gradient.

0. We have

d_{0} = - \nabla l (V_{0})

as long as a convergence criterion is not verified.

1. Determination of a step

λ_{k}

by some linear search method. Computation of a new iteration

V_{k + 1} = V_{k} + λ_{k} d_{k} .

2. Evaluation of a new gradient

\nabla l (V_{k + 1}) .

3. Computation of the real

ω_{k + 1}

by some methods, for example Fletcher and Reeves (see below),

ω_{k + 1} = \frac{\nabla l {(V_{k + 1})}^{T} \nabla l (V_{k + 1})}{\nabla l {(V_{k})}^{T} \nabla l (V_{k})} .

(5)

4. Construction of a new descent direction:

d_{k + 1} = - \nabla l (V_{k + 1}) + ω_{k + 1} d_{k} .

5. Increment

k = k + 1 .

Several methods exist for computing the term

ω_{k + 1};

we will be concerned in the sequel by the methods of Fletcher-Reeves, of Polack-Ribiere, and of Hesteness-Stiefel (a variant of the method of Fletcher-Reeves). It should be noted that this last method is particularly effective in the case when the function is quadratic and when the linear search is carried out exactly. The corresponding

ω_{k + 1}

for these methods is given by:

Fletcher-Reeves

ω_{k + 1} = \frac{\nabla l {(V_{k + 1})}^{T} \nabla l (V_{k + 1})}{\nabla l {(V_{k})}^{T} \nabla l (V_{k})};

Polack-Ribiere

ω_{k + 1} = \frac{\nabla l {(V_{k + 1})}^{T} (\nabla l (V_{k + 1}) - \nabla l (V_{k}))}{\nabla l {(V_{k})}^{T} \nabla l (V_{k})};

Hestenes-Stiefel

ω_{k + 1} = \frac{\nabla l {(V_{k + 1})}^{T} (\nabla l (V_{k + 1}) - \nabla l (V_{k}))}{{(\nabla l (V_{k + 1}) - \nabla l (V_{k}))}^{T} \nabla l (V_{k})} .

Noting that

l (V_{k + 1}) = l (V_{k}) + (V_{k + 1} - V_{k}) \frac{d l}{d V_{k}} (V_{k}) + R e s t (V_{k + 1} - V_{k}),

(6)

we have the recurrence

V_{k + 1} = V_{k} + λ_{k} d_{k},

where the step of descent

λ_{k}

is obtained with an exact or inaccurate linear search. The descent vector

d_{k}

is obtained by using a conjugate gradient algorithm based on the recurrence formula

d_{k} = - \nabla l (V_{k}) - ω_{k} d_{k - 1},

where

ω_{k} = \frac{∥ \nabla l (V_{k}) ∥^{2}}{∥ \nabla l (V_{k - 1}) ∥^{2}} .

3.2. Optimization Methods for Computing the MLEs of the BP Distribution

At this stage, we are interested in describing the application of the three optimization methods for computing the MLEs of the BP distribution. In the sequel we will present in details only the most complex one, namely the conjugate gradient method. Following the same line, the other two methods can also be adapted for computing the MLEs of the BP distribution. Note also that we will give in the next section numerical results for all three methods.

So, our purpose is to numerically determine the MLEs of the BP parameters by calculating the maximum of the log-likelihood function

l (α, β, k, θ)

given in (1).

As we have previously mentioned, the MLE of the parameter

θ

is the first order statistic, so, in the sequel, the iterates

θ_{0} = θ_{1} = \dots = x_{(1)}

are fixed.

To compute the MLEs of the parameters, we proceed as follows:

1. Step 0: Initialization

V_{0} = (α_{0}, β_{0}, k_{0}, θ_{0}) .

2. Step 1: Computation of the gradient

\begin{matrix} \nabla l (α, β, k, θ) = (\begin{matrix} \frac{\partial l (α, β, k, θ)}{\partial α} \\ \frac{\partial l (α, β, k, θ)}{\partial β} \\ \frac{\partial l (α, β, k, θ)}{\partial k} \end{matrix}) \\ = & (\begin{matrix} n [Ψ (α + β) - Ψ (α)] + \sum_{i = 1}^{n} ln (1 - {(\frac{x_{i}}{θ})}^{- k}) \\ n [Ψ (α + β) - Ψ (β)] - k \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ}) \\ \frac{n}{k} - \sum_{i = 1}^{n} [β + (α - 1) {(1 - {(\frac{x_{i}}{θ})}^{k})}^{- 1}] ln (\frac{x_{i}}{θ}) \end{matrix}) . \end{matrix}

If

\nabla l (α, β, k, θ) \leq ϵ,

then stop,

V_{0} = (α_{0}, β_{0}, k_{0}, θ_{0})

is the optimal vector.

Else: go to the next step.

3. Step 2: Computation of the direction of descent

If

i = 0 :

d_{0} = - \nabla l (α_{0}, β_{0}, k_{0}, θ_{0}) = - (\begin{matrix} n [Ψ (α_{0} + β_{0}) - Ψ (α_{0})] + \sum_{i = 1}^{n} ln (1 - {(\frac{x_{i}}{θ_{0}})}^{- k_{0}}) \\ n [Ψ (α_{0} + β_{0}) - Ψ (β_{0})] - k_{0} \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ_{0}}) \\ \frac{n}{k_{0}} - \sum_{i = 1}^{n} [β_{0} + (α_{0} - 1) {(1 - {(\frac{x_{i}}{θ_{0}})}^{k_{0}})}^{- 1}] ln (\frac{x_{i}}{θ_{0}}) \end{matrix}) .

If

i > 0 :

d_{i + 1} = - \nabla l (α_{i}, β_{i}, k_{i}, θ_{i}) + ω_{i} d_{i};

when we use the method of Fletcher-Reeves (5), we will have the result

ω_{i}^{F R} = \frac{∥ \nabla l (α_{i + 1}, β_{i + 1}, k_{i + 1}, θ_{i + 1}) ∥^{2}}{∥ \nabla l (α_{i}, β_{i}, k_{i}, θ_{i}) ∥^{2}} .

4. Step 3: Computation of

V_{1} = (α_{1}, β_{1}, k_{1}, θ_{1})

V_{1} = V_{0} + λ_{0} d_{0}

such as

d_{0} = - \nabla l (α_{0}, β_{0}, k_{0}, θ_{0}) .

Determination of a step $λ_{0}$

We can find

λ_{0}

with exact and inaccurate linear search methods. In our case, the use of the exact linear search helps us to have a fast convergence. The exact linear search method is to solve the problem

min_{λ_{0} \in R^{+}} l (V_{0} + λ_{0} d_{0}) .

(7)

For this, we will look for the value of

λ_{0}

which cancels the first derivative of the function

l (V_{0} + λ_{0} d_{0}),

\begin{matrix} α_{0} - λ_{0} \{n [Ψ (α_{0} + β_{0}) - Ψ (α_{0})] + \sum_{i = 1}^{n} ln (1 - {(\frac{x_{i}}{θ_{0}})}^{- k_{0}})\} & = & 0, \\ β_{0} - λ_{0} \{n [Ψ (α_{0} + β_{0}) - Ψ (β_{0})] - k_{0} \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ_{0}})\} & = & 0, \\ k_{0} - λ_{0} \{\frac{n}{k_{0}} - \sum_{i = 1}^{n} [β_{0} + (α_{0} - 1) {(1 - {(\frac{x_{i}}{θ_{0}})}^{k_{0}})}^{- 1}] ln (\frac{x_{i}}{θ_{0}})\} & = & 0 . \end{matrix}

Thus we can deduce the exact value of

λ_{0}

\begin{matrix} λ_{0} & = & (α_{0} + β_{0} + k_{0}) / [2 n Ψ (α_{0} + β_{0}) - n Ψ (α_{0}) - n Ψ (β_{0}) - k_{0} \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ_{0}}) \\ + \sum_{i = 1}^{n} ln (1 - {(\frac{x_{i}}{θ_{0}})}^{- k_{0}}) + \frac{n}{k_{0}} - \sum_{i = 1}^{n} [β_{0} + (α_{0} - 1) {(1 - {(\frac{x_{i}}{θ_{0}})}^{k_{0}})}^{- 1}] ln (\frac{x_{i}}{θ_{0}})] . \end{matrix}

If

λ_{0}

is positive, we accept it and we go to next step. If not, we use inexact methods of linear search to have an approximation of the optimal value

λ_{0}

and we go to next step. The main inexact methods are the so-called inaccurate linear search methods of Armijo [13], of Goldstein [14], of Wolfe [15] and of Wolfe strong [16].

Construction of the vector $V_{1}$

V_{1} = V_{0} + λ_{0} d_{0} = (\begin{matrix} α_{0} + λ_{0} [- n {Ψ (α_{0} + β_{0}) - Ψ (α_{0})} - \sum_{i = 1}^{n} ln (1 - {(\frac{x_{i}}{θ_{0}})}^{- k_{0}})] \\ β_{0} + λ_{0} [- n {Ψ (α_{0} + β_{0}) - Ψ (β_{0})} + k_{0} \sum_{i = 1}^{n} ln (\frac{x_{i}}{θ_{0}})] \\ k_{0} + λ_{0} [- \frac{n}{k_{0}} + \sum_{i = 1}^{n} [β_{0} + (α_{0} - 1) {(1 - {(\frac{x_{i}}{θ_{0}})}^{k_{0}})}^{- 1} ln (\frac{x_{i}}{θ_{0}})]] \end{matrix}),

where

λ_{0}

was previously obtained.

Set

i = i + 1

and go to Step 1.

Based on Zoutendijk’s theorem [17] and globally convergent Riemannian conjugate gradient method [18], the convergence of the algorithm is ensured.

3.3. Numerical Results

We carried out an important simulation study using the programming software R. In the sequel we present the results obtained by means of the three optimization methods.

In Table 1, Table 2 and Table 3, the MLEs of the parameters

α,

β

and k of the BP distribution are presented, together with the standard deviations (SDs), considering the three optimization methods. Note the convergence of the estimators obtained with the gradient and conjugate gradient methods and note also that the Newton’s method does not converge with a non-quadratic or nonlinear function.

The Table 4, Table 5 and Table 6 present the MLEs of the parameters for the conjugate gradient method and gradient methods, as well as the bias and the mean square errors (MSEs) of the estimators. Note that the MSEs of the estimators have very small values, smaller than

10^{- 4} .

The interest of the method of conjugated gradient comes from the fact that it converges quickly towards the minimum; we can show that in N dimensions it only needs maximum N calculation steps, if the function is exactly quadratic. The inconvenient of the Newton’s method is that it requires to know the Hessian of the function in order to determine the descent step. We have seen that the conjugate gradient algorithm chooses optimally the descent directions through

V (α, β, k, θ)

.

In Figure 6, Figure 7 and Figure 8 we also present the evolution of the MSEs of the computed estimators. The three numerical methods are carried out for

m =

10,000 iterations.

As we see in these figures, the MLEs of the parameters

α, β,

and k obtained by Newton’s, gradient and conjugate gradient methods are

\sqrt{n}

-consistent. And we can say that the conjugate gradient method gives the best results. In this way, we have numerically checked the well known properties of MLEs, namely the consistency and asymptotic normality, by verifying that the values of the mean square errors obtained are

\sqrt{n}

-consistent, which confirm the theory of the method used.

4. Model Selection

We want also to take into account model selection criteria in order to choose among several candidate models, BP, Beta, Pareto, Gamma and Generalized Beta-Pareto distributions. The Generalized BP (GBP) distribution is a 5-parameter distribution introduced in Reference [19], with a different form given in Reference [20]. The density of this distribution is defined by

g (x; α, β, k, θ, c) : = \frac{c}{B (α, β)} {[1 - {(\frac{θ}{x})}^{k}]}^{c θ - 1} {[1 - {(1 - {(\frac{θ}{x})}^{k})}^{c}]}^{β - 1} \frac{k}{θ} {(\frac{θ}{x})}^{k + 1}, x \geq θ > 0,

(8)

with the parameters

α, β, θ, c, k > 0 .

For the model selection problem, we considered the general information criterion (GIC)

G I C (K) = - 2 ln L + I \times K,

where: L is the likelihood, K is the number of parameters of the model, I is an index for the penalty of the model. The following well-known criteria are obtained for different values of I:

A I C (K) = - 2 ln L + 2 K

B I C (K) = - 2 ln L + ln (n) K

and

A I C c (K) = A I C (K) + \frac{2 K (K + 1)}{n - K - 1},

where n is the sample size.

We have simulated data according to BP distribution, Pareto (P) distribution, Gamma (G) distribution and GBP distribution. We have computed the three criteria AIC, BIC and AICc and recorded in Table 7, Table 8, Table 9 and Table 10 the number of times in 1000 iterations that each of the models is selected (minimum value of the corresponding criteria).

Several remarks need to be done. First, since the support of a BP distribution is

[θ, + \infty),

it makes sense to compare data from BP distribution with data from Gamma, Pareto and GBP distributions. For example, when

θ

is close to

0,

the support of the BP distribution is close to the support of Gamma distribution. BP distribution reduces to Pareto distribution in the case where

α = β = 1,

and the support is the same.

Second, note that we have added the Beta (B) distribution as one of the candidate distributions to be chosen by the information criteria. Although this is not a “real” candidate (since the support is

(0, 1)

), we wanted to check also the model selection criteria in the presence of an “unusual” model. Clearly, it does not make sense to compare data from BP, GBP, Gamma or Pareto distribution, on the one side, with data from Beta distribution, on the other side. Nonetheless, it is important to have an idea about how the model criteria behave in this case.

We can notice that all criteria choose the correct model in all the 4 situations, for moderate or even small values of the sample size.

We can also remark that, when the underlying model is the GBP, for small values of the sample size n (less than 50), the criteria fail in choosing the correct model; nonetheless, starting from reasonable values of the sample size n (around 50 or 100), the correct model is chosen by all criteria in most of the cases. This phenomenon could be generated by the fact that the number of parameters in the GBP model is the highest one, 5, which has an influence on the information criteria.

We have also noticed a strange phenomenon: also when the underlying model is the GBP, for small values of the sample size

n,

the model preferred by the criteria is the Beta model instead of the GBP, that is to say the“unusual” model, as previously explained. The fact is more accentuated for the BIC criteria, so surely it is related to the penalization due to the number of parameters.

Another remark related to the difference between BP and GBP models needs to be done. When comparing Table 7 and Table 10, we can notice an asymmetry between the model selection criteria when the underlying true models are the BP, respectively the GBP model. On the one side, when the underlying model is the GBP (Table 7), the BP model is chosen by all criteria in 15–20% of cases for small values of the sample size. On the other side, when the underlying model is the BP (Table 10), the GBP model is never chosen. Although we think that this phenomenon could be related to the number of parameters of each model, to the presence of the other models investigated by the criteria as candidate models (Pareto, Beta, Gamma), or to the particular case that we considered in simulations, we do not have a complete explanation for this phenomenon.

5. MLEs for Right-Censored Data

Let us consider that the random right-censorship C is non-informative. Roughly speaking, a censorship is non-informative in the sense that the censoring distribution provides no information about the lifetime distribution. If the censoring mechanism is independent of the survival time, then we will have non-informative censoring. In fact, in practice we almost always mean independent censoring by non-informative censoring. It is well known that if the censoring variable C depends on the lifetime X, we run into the problem of non-identifiability: starting from observations, we cannot make inference about X or C, because several different distributions for

(X, C)

could provide the same distribution for the observation. For these reasons, researchers almost always consider independence between censoring and lifetime. In most practical examples this makes sense; for instance, if the lifetime represents the remission time of patients and the censoring mechanism comes from loss to follow-up (patients change town, for instance), it is natural to consider that we have independence between censoring and lifetime.

In our case, suppose that the variables X and C have respectively the probability density functions f and g and survival functions S and

G;

all the information is contained in the couple

(T_{j}, δ_{j})

where

T_{j} = min (X_{j}, C_{j})

is the observed time and

δ_{j} = 1_{(X_{j} \leq C_{j})}

is the censorship indicator. So, the contribution to the likelihood for the individual j is

L_{j} = {[f (t_{j} | V) G (t_{j})]}^{δ_{j}} {[g (t_{j}) S (t_{j} | V)]}^{1 - δ_{j}} .

Note that the term

{[f (t_{j} | V) G (t_{j})]}^{δ_{j}}

corresponds to the case

δ_{j} = 1,

that is to say to the case when the lifetime is observed; in this situation, the contribution to the likelihood of the lifetime X is

f (t_{j} | V),

while the contribution to the likelihood of the censorship is

G (t_{j}) .

The second term has an analogous interpretation in the case when

δ_{j} = 0,

that is to say in the case when the censorship is observed.

We also assume that there are no common parameters between the censoring and the lifetime; consequently, the parameter V does not appear in the distribution of the censorship. The useful part of the likelihood (for obtaining the MLEs of interest, that is, the MLEs of the BP distribution) is then reduced to

L = \prod_{j = 1}^{m} f {(t_{j} | V)}^{δ_{j}} S {(t_{j} | V)}^{1 - δ_{j}}

and the log-likelihood is

\begin{matrix} l & : = & ln L = \sum_{j = 1}^{m} δ_{j} ln f (t_{j} | V) + \sum_{j = 1}^{m} (1 - δ_{j}) ln S (t_{j} | V) \\ = & \sum_{j = 1}^{m} δ_{j} [ln (\frac{k}{θ B (α, β)}) + (α - 1) ln (1 - {(\frac{t_{j}}{θ})}^{- k}) + (- k β - 1) ln (\frac{t_{j}}{θ})] \\ + \sum_{j = 1}^{m} (1 - δ_{j}) ln [\frac{{(\frac{t_{j}}{θ})}^{- k β}}{β B (α, β)} + \frac{{(\frac{t_{j}}{θ})}^{β}}{β B (α + β)} \sum_{n = 1}^{\infty} \frac{\prod_{i = 1}^{n} (i - α)}{n! (β + n)} {(\frac{t_{j}}{θ})}^{- n k}], \end{matrix}

where

x \geq θ

and

α, β, k, θ > 0 .

Consequently we get

\begin{matrix} l & = & ln (\frac{k}{θ}) \sum_{j = 1}^{m} δ_{j} - m ln (B (α, β)) + \sum_{j = 1}^{m} δ_{j} (α - 1) ln (1 - {(\frac{t_{j}}{θ})}^{- k}) - \sum_{j = 1}^{m} δ_{j} ln (\frac{t_{j}}{θ}) \\ - k β \sum_{j = 1}^{m} ln (\frac{t_{j}}{θ}) + \sum_{j = 1}^{m} (1 - δ_{j}) ln [β^{- 1} + \sum_{n = 1}^{\infty} \frac{\prod_{i = 1}^{n} (i - α)}{n! (β + n)} {(\frac{t_{j}}{θ})}^{- n k}] . \end{matrix}

(9)

Setting

z (t_{j}, α, β, k, θ) : = β^{- 1} + \sum_{n = 1}^{\infty} \frac{\prod_{i = 1}^{n} (i - α)}{n! (β + n)} {(\frac{t_{j}}{θ})}^{- n k}

and

z_{α} (t_{j}, α, β, k, θ) : = \frac{\partial z}{\partial α} (t_{j}, α, β, k, θ),

z_{β} (t_{j}, α, β, k, θ) : = \frac{\partial z}{\partial β} (t_{j}, α, β, k, θ),

z_{k} (t_{j}, α, β, k, θ) : = \frac{\partial z}{\partial k} (t_{j}, α, β, k, θ),

we obtain the score functions

\begin{matrix} \frac{\partial l (α, β, k, θ)}{\partial α} & = & m [Ψ (α + β) - Ψ (α)] + \sum_{j = 1}^{m} δ_{j} ln [1 - {(\frac{t_{j}}{θ})}^{- k}] + \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{α} (t_{j}, α, β, k, θ)}{z (t_{j}, α, β, k, θ)} \\ \frac{\partial l (α, β, k, θ)}{\partial β} & = & m [Ψ (β) - Ψ (β + α)] - k \sum_{j = 1}^{m} ln (\frac{t_{j}}{θ}) + \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{β} (t_{j}, α, β, k, θ)}{z (t_{j}, α, β, k, θ)} \\ \frac{\partial l (α, β, k, θ)}{\partial k} & = & k^{- 1} \sum_{j = 1}^{m} δ_{j} - \sum_{j = 1}^{m} [β + δ_{j} ln (\frac{t_{j}}{θ}) {(1 - {(\frac{t_{j}}{θ})}^{k})}^{- 1}] ln (\frac{t_{j}}{θ}) + \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{k} (t_{j}, α, β, k, θ)}{z (t_{j}, α, β, k, θ)} . \end{matrix}

5.1. Conjugate Gradient Method for Parameter MLEs of the BP Distribution with Right-Censored Data

As we have already mentioned, the MLE of the parameter

θ

is the first order statistic. Let us fix an arbitrary

ϵ > 0 .

The algorithm based on the conjugate gradient that we propose is as follows.

1. Step 0: Initialization

V_{0} = (α_{0}, β_{0}, k_{0}, θ_{0}) .

2. Step 1: Computation of the gradient

\begin{matrix} \nabla l (α, β, k, θ) = (\begin{matrix} \frac{\partial l (α, β, k, θ)}{\partial α} \\ \frac{\partial l (α, β, k, θ)}{\partial β} \\ \frac{\partial l (α, β, k, θ)}{\partial k} \end{matrix}) \\ = & (\begin{matrix} m [Ψ (α + β) - Ψ (α)] + \sum_{j = 1}^{m} δ_{j} ln [1 - {(\frac{t_{j}}{θ})}^{- k}] + \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{α} (t_{j}, α, β, k, θ)}{z (t_{j}, α, β, k, θ)} \\ m [Ψ (β) - Ψ (β + α)] - k \sum_{j = 1}^{m} ln (\frac{t_{j}}{θ}) + \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{β} (t_{j}, α, β, k, θ)}{z (t_{j}, α, β, k, θ)} \\ k^{- 1} \sum_{j = 1}^{m} δ_{j} - \sum_{j = 1}^{m} [β + δ_{j} ln (\frac{t_{j}}{θ}) {[1 - {(\frac{t_{j}}{θ})}^{k}]}^{- 1}] ln (\frac{t_{j}}{θ}) + \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{k} (t_{j}, α, β, k, θ)}{z (t_{j}, α, β, k, θ)} \end{matrix}) . \end{matrix}

If

\nabla l (α, β, θ, k) \leq ϵ,

then stop,

V_{0} = (α_{0}, β_{0}, k_{0}, θ_{0})

is the optimal vector. If not, go to the next step.

3. Step 2: Computation of the direction of descent

If

i = 0

, then

\begin{matrix} d_{0} = - \nabla l (α_{0}, β_{0}, k_{0}, θ_{0}) \\ = & (\begin{matrix} - m [Ψ (α_{0} + β_{0}) + Ψ (α_{0})] - \sum_{j = 1}^{m} δ_{j} ln [1 - {(\frac{t_{j}}{θ_{0}})}^{- k_{0}}] - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{α} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})} \\ - m [Ψ (β_{0}) + Ψ (β_{0} + α_{0})] - k_{0} \sum_{j = 1}^{m} ln (\frac{t_{j}}{θ_{0}}) - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{β} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})} \\ - k_{0}^{- 1} \sum_{j = 1}^{m} δ_{j} + \sum_{j = 1}^{m} [β_{0} - δ_{j} ln (\frac{t_{j}}{θ_{0}}) {[1 - {(\frac{t_{j}}{θ_{0}})}^{k_{0}}]}^{- 1}] ln (\frac{t_{j}}{θ_{0}}) - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{k} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})} \end{matrix}) . \end{matrix}

If

i > 0,

then

d_{i + 1} = - \nabla l (α_{i}, β_{i}, k_{i}, θ_{i}) + ω_{i} d_{i};

using the Fletcher-Reeves’ method we get

ω_{i}^{F R} = \frac{∥ \nabla l (α_{i + 1}, β_{i + 1}, k_{i + 1}, θ_{i + 1}) ∥^{2}}{∥ \nabla l (α_{i}, β_{i}, k_{i}, θ_{i}) ∥^{2}} .

4. Step 3: Computation of

V_{1} = (α_{1}, β_{1}, k_{1}, θ_{1})

V_{1} = V_{0} + λ_{0} d_{0},

such that

d_{0} = - \nabla l (α_{0}, β_{0}, k_{0}, θ_{0}) .

Computation of a step $λ_{0}$ : We can find

λ_{0}

with exact and inaccurate linear search methods. In this case, the use of the exact linear search helps us to have a fast convergence. The exact linear search method is to solve the problem

min_{λ_{0} \in R^{+}} l (V_{0} + λ_{0} d_{0}) .

To solve this problem, we must find the value of

λ_{0}

such that

\nabla l (V_{0} + λ_{0} d_{0}) = 0 :

\begin{matrix} α_{0} - λ_{0} \times \\ [- m [Ψ (α_{0} + β_{0}) + Ψ (α_{0})] - \sum_{j = 1}^{m} δ_{j} ln [1 - {(\frac{t_{j}}{θ_{0}})}^{- k_{0}}] - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{α} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}] = 0, \\ β_{0} - λ_{0} [- m [Ψ (β_{0}) + Ψ (β_{0} + α_{0})] - k_{0} \sum_{j = 1}^{m} ln (\frac{t_{j}}{θ_{0}}) - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{β} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}] = 0, \\ k_{0} - λ_{0} \times \\ [- k_{0}^{- 1} \sum_{j = 1}^{m} δ_{j} + \sum_{j = 1}^{m} [β_{0} - δ_{j} ln (\frac{t_{j}}{θ_{0}}) {[1 - {(\frac{t_{j}}{θ_{0}})}^{k_{0}}]}^{- 1}] ln (\frac{t_{j}}{θ_{0}}) - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{k} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}] = 0 . \end{matrix}

Consequently

\begin{matrix} α_{0} + β_{0} + k_{0} + λ 0 \times \\ (\begin{matrix} [- m [Ψ (α_{0} + β_{0}) + Ψ (α_{0})] - \sum_{j = 1}^{m} δ_{j} ln [1 - {(\frac{t_{j}}{θ_{0}})}^{- k_{0}}] - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{α} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}] \\ + [- m [Ψ (β_{0}) + Ψ (β_{0} + α_{0})] - k_{0} \sum_{j = 1}^{m} ln (\frac{t_{j}}{θ_{0}}) - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{β} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}] \\ + [- k_{0}^{- 1} \sum_{j = 1}^{m} δ_{j} + \sum_{j = 1}^{m} [β_{0} - δ_{j} ln (\frac{t_{j}}{θ_{0}}) {[1 - {(\frac{t_{j}}{θ_{0}})}^{k_{0}}]}^{- 1}] ln (\frac{t_{j}}{θ_{0}}) - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{k} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}] \end{matrix}) = 0 \end{matrix}

and we obtain

\begin{matrix} λ_{0} & = & - (α_{0} + β_{0} + k_{0}) / [- m [Ψ (α_{0} + β_{0}) + Ψ (α_{0})] - \sum_{j = 1}^{m} δ_{j} ln [1 - {(\frac{t_{j}}{θ_{0}})}^{- k_{0}}] \\ - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{α} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})} - m [Ψ (β_{0}) + Ψ (β_{0} + α_{0})] - k_{0} \sum_{j = 1}^{m} ln (\frac{t_{j}}{θ_{0}}) \\ - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{β} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})} - k_{0}^{- 1} \sum_{j = 1}^{m} δ_{j} + \sum_{j = 1}^{m} [β_{0} - δ_{j} ln (\frac{t_{j}}{θ_{0}}) {[1 - {(\frac{t_{j}}{θ_{0}})}^{k_{0}}]}^{- 1}] ln (\frac{t_{j}}{θ_{0}}) \\ - \sum_{j = 1}^{m} (1 - δ_{j}) \frac{z_{k} (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}{z (t_{j}, α_{0}, β_{0}, k_{0}, θ_{0})}] . \end{matrix}

5.2. Numerical Results for BP Parameter Estimation with Censored Data

To show the efficiency of the MLEs obtained by the gradient and conjugate gradient method when data are right censored, an important simulation study was realized. The results of the MLEs and their SDs are summarized in the Table 11 and Table 12.

To evaluate and compare the performance of the estimators obtained by the proposed methods, we present in the Table 13 and Table 14 the bias and MSEs of the estimators.

5.3. Model Selection for Censored Data

In this section, the censored data are used with the AIC, BIC and AICc criteria to choose the best statistical model for our statistical population. These results are presented in Table 15, Table 16, Table 17 and Table 18. As for the uncensored data, we can conclude that the information criteria choose the correct model even for small or medium values of the sample size

n .

Note that similar remarks as the ones done in Section 4 for model selection for complete (uncensored) data hold true also here.

6. Conclusions

In this paper, we have developed different optimization methods to determine the maximum likelihood estimators of the four parameters of the Beta-Pareto distribution. The results obtained showed that all the used methods give

\sqrt{n}

-consistent estimators in both complete and right-censored data samples and particularly the conjugate gradient method gives the best results. Using classical model selection criteria, our study proves that this new model can be used instead of several alternative models such as Beta, Pareto, Gamma, and Generalized Beta-Pareto. Another important contribution of our work is the use of right-censored data as an input for the estimation procedure and model selection techniques.

Author Contributions

All authors have equally contributed to different stages of the article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Akinsete, A.; Famoye, F.; Lee, C. The Beta-Pareto distribution. Statistics 2008, 42, 547–563. [Google Scholar] [CrossRef]
Asmussen, S. Applied Probability and Queues; Springer: New York, NY, USA, 2003. [Google Scholar]
Zhang, J. Likelihood moment estimation for the generalized Pareto distribution. Aust. N. Z. J. Stat. 2007, 49, 69–77. [Google Scholar] [CrossRef]
Levy, M.; Levy, H. Investment talent and the Pareto wealth distribution: Theoretical and experimental analysis. Rev. Econ. Stat. 2003, 85, 709–725. [Google Scholar] [CrossRef]
Burroughs, S.M.; Tebbens, S.F. Upper-truncated power law distributions. Fractals 2001, 9, 209–222. [Google Scholar] [CrossRef]
Newman, M.E. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 2005, 46, 323–351. [Google Scholar] [CrossRef] [Green Version]
Choulakian, V.; Stephens, M.A. Goodness-of-fit tests for the generalized Pareto distribution. Technometrics 2001, 43, 478–484. [Google Scholar] [CrossRef]
Conn, A.R.; Gould, N.I.M.; Toint, P.L. Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math. Program. 1991, 50, 177–195. [Google Scholar] [CrossRef]
Hestenes, M.R.; Stiefel, E. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Computer J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
Polak, E.; Ribiere, G. Note sur la convergence de méthodes de directions conjuguées. ESAIM Math. Model. Numer. Anal. 1969, 3, 35–43. (In French) [Google Scholar] [CrossRef]
Fletcher, R. Practical Methods of Optimization, 2nd ed.; Wiley: Hoboken, NJ, USA, 2000. [Google Scholar]
Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 1966, 16, 1–3. [Google Scholar] [CrossRef] [Green Version]
Goldstein, A.A. On steepest descent. J. Soc. Ind. Appl. Math. Ser. A Control 1965, 3, 147–151. [Google Scholar] [CrossRef]
Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. 1969, 11, 226–235. [Google Scholar] [CrossRef]
Fletcher, R.; Powell, M.J.D. A rapidly convergent descent method for minimization. Comp. J. 1963, 6, 163–168. [Google Scholar] [CrossRef]
Sato, H.; Iwai, T. A new, globally convergent Riemannian conjugate gradient method. Optimization 2013, 64, 1011–1031. [Google Scholar] [CrossRef] [Green Version]
Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
Nassar, M.M.; Nada, N.K. The beta generalized Pareto distribution. J. Stat. Adv. Theory Appl. 2011, 6, 1–17. [Google Scholar]
Mahmoudi, E. The beta generalized Pareto distribution with application to lifetime data. Math. Comp. Simul. 2011, 81, 2414–2430. [Google Scholar] [CrossRef]

Figure 1. Density of the BP distribution.

Figure 2. Survival function of the BP distribution.

Figure 3. Cumulative distribution function of the BP distribution.

Figure 4. Hazard rate of the BP distribution.

Figure 5. Cumulative hazard rate of the BP distribution.