Next Article in Journal
Development and Validation of a Tool for VBOI (Virtual Body Ownership Illusion) Level Assessment
Next Article in Special Issue
Stochastic Memristor Modeling Framework Based on Physics-Informed Neural Networks
Previous Article in Journal
A Watermark-Based Scheme for Authenticating JPEG 2000 Image Integrity That Complies with JPEG Privacy and Security
Previous Article in Special Issue
Exploring the Efficacy of Binary Surveys versus Likert Scales in Assessing Student Perspectives Using Bayesian Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation and Bayesian Prediction of the Generalized Pareto Distribution in the Context of a Progressive Type-II Censoring Scheme

1
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA
2
School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(18), 8433; https://doi.org/10.3390/app14188433
Submission received: 16 August 2024 / Revised: 8 September 2024 / Accepted: 17 September 2024 / Published: 19 September 2024
(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization)

Abstract

:
The generalized Pareto distribution plays a significant role in reliability research. This study concentrates on the statistical inference of the generalized Pareto distribution utilizing progressively Type-II censored data. Estimations are performed using maximum likelihood estimation through the expectation–maximization approach. Confidence intervals are derived using the asymptotic confidence intervals. Bayesian estimations are conducted using the Tierney and Kadane method alongside the Metropolis–Hastings algorithm, and the highest posterior density credible interval estimation is accomplished. Furthermore, Bayesian predictive intervals and future sample estimations are explored. To illustrate these inference techniques, a simulation and practical example are presented for analysis.

1. Introduction

In recent years, there has been a growing focus on the analysis of life test data. Within the field of reliability analysis, it is commonplace for data to be only partially observed, leading to the frequent use of censored data. According to the research conducted by [1], a method was employed to estimate the population parameters of a life test distribution by subdividing the failure population into subpopulations that adhere to an exponential distribution. The sampling process was censored at a predetermined test termination time. In the analysis of lifetime data, two censoring schemes are commonly utilized: Type-I and Type-II censoring. In Type-I censoring, the censoring time is established in advance, while the number of observed failed units is variable. Conversely, in Type-II censoring, the lifetime test concludes upon reaching a predetermined number of failures, with the termination time being random.
One limitation of traditional censoring models is that live units can only be removed at the final termination point. In numerous scenarios involving life tests, it may be necessary to eliminate surviving units from the test prior to reaching the final termination point. In light of the work conducted by [2], a definition for progressive Type-II censoring models has been proposed.
Numerous researchers have examined different lifetime distributions utilizing progressive Type-II censoring data. For instance, the study conducted by [3] focused on linear inference for progressive Type-II censoring order statistics across various population distribution families, including scale, location, and location–scale families such as the Weibull distribution. Additionally, ref. [4] presented a Bayesian analysis of the Rayleigh distribution based on progressively Type-II censored samples.
The Generalized Pareto Distribution (GPD) is extensively used in life test analysis across multiple domains, including biology and geography. For instance, the study by [5] explored the applications of GPD in wind engineering, highlighting its advantages over the generalized extreme value distribution in the context of extreme value analysis. Additionally, ref. [6] investigated the effects of event independence and threshold selection by conducting a GPD analysis on gust velocity maxima recorded at several island locations.
In this paper, we utilize the generalized Pareto distribution with unknown parameters, as discussed by [7], to estimate these parameters using progressive Type-II censoring samples and to make corresponding Bayesian predictions.
In recent years, numerous researchers have made significant contributions to the study of the parameters of the generalized Pareto distribution. For instance, ref. [8] investigated minimum density power divergence estimations (MDPDE) for GPD parameters and conducted a comparison of the efficiencies among MDPDE, maximum likelihood estimations (MLE), Dupuis’s optimally biased robust estimator, and the medians estimator proposed by Peng and Welsh. Additionally, ref. [9] addressed statistical inference challenges related to GPD within the framework of progressive Type-I censoring. The author successfully derived MLEs using the Expectation–Maximization (EM) algorithm along with the Fisher information matrix, and they also formulated asymptotic confidence intervals for the parameters. The discussion in [10] focused on estimating GPD parameters through MLE, the probability-weighted moments method, and the method of moments. Moreover, ref. [11] implemented bootstrap algorithms and Monte Carlo methods to derive Bayesian estimations and confidence intervals for GPD. Furthermore, ref. [12] utilized LINEX, entropy loss, and precautionary functions to achieve Bayesian estimations of GPD parameters, employing quasi, uniform prior, and inverted gamma distributions. For further details and examples related to the generalized Pareto distribution, please refer to the following articles: [6,7,9,13,14,15,16].
The remainder of this article is structured as follows: Section 2 will discuss Maximum Likelihood Estimators (MLEs). Section 3 outlines the observed Fisher information matrix and the methodology for calculating asymptotic confidence intervals. In Section 4, we derive Bayesian estimations of unknown parameters utilizing various loss functions. These estimations are then computed using the Tierney and Kadane (TK) method. Section 4 also introduces the Metropolis–Hastings (MH) algorithm, which is employed to obtain Bayesian estimations. The samples generated using this approach are utilized to construct the highest posterior density intervals. In Section 5, we present Bayesian point predictions and interval predictions for future observable samples. Section 6 and Section 7 offer simulation studies and compare the performance of the discussed approaches, with a real dataset being analyzed for demonstration purposes. Finally, we present our conclusions in Section 8.

2. Maximum Likelihood Estimation

In the progressive Type-II censored model, a total of n units are included in a reliability test, with only m units being monitored until failure. Upon the occurrence of the first failure ( x 1 ), a random selection of R 1 units is removed from the remaining n 1 units. This procedure is repeated for each subsequent failure, resulting in the removal of R 2 units following the second failure ( x 2 ), and so forth, until  R m units are removed after the m-th failure. The censoring scheme, denoted as R = R 1 , , R m , is established prior to the commencement of the test and must comply with the following conditions: 0 < x 1 < x 2 < < x m and n = R 1 + + R m + m . Progressive Type-II censoring provides both practicality and flexibility by permitting units to be removed at any phase following a failure.
The notation R = ( R 1 , , R m ) denotes the censoring scheme. Here, X = ( X 1 , , X i , , X m ) represents the corresponding progressive censored samples derived from the generalized Pareto distribution.
A random variable X is said to follow a generalized Pareto distribution if its probability density function (PDF) and cumulative distribution function (CDF) are defined as follows:
f ( x ; α , λ ) = λ α ( 1 + λ x ) ( α + 1 ) , x > 0 ,
F ( x ; α , λ ) = 1 ( 1 + λ x ) α , x > 0 ,
where the parameters associated with the scale and shape are denoted as λ > 0 and α > 0 . We refer to this distribution as G P D ( α , λ ) . It is also recognized as the second kind of Pareto distribution or the Lomax distribution.
The likelihood function for samples obtained from the progressive Type-II censoring scheme is as follows:
l ( α , λ | x ^ ) = α m λ m i = 1 m ( λ x i + 1 ) ( α + 1 ) [ ( λ x i + 1 ) α ] R i ,
where x ^ = ( x 1 , , x i , , x m ) represents the observed value of X = ( X 1 , , X i , , X m ) . Consequently, we can derive the log-likelihood function as follows:
L ( α , λ | x ^ ) = m ln α λ ( α + 1 ) i = 1 m ln ( λ x i + 1 ) α i = 1 m R i ln ( λ x i + 1 ) ,
L α = m α i = 1 m ( 1 + R i ) ln ( λ x i + 1 ) = 0 ,
L λ = m λ ( α + 1 ) i = 1 m x i λ x i + 1 α i = 1 m R i x i λ x i + 1 = 0 .
To obtain maximum likelihood estimates, the traditional approach is the Newton–Raphson method, as referenced in [17]. However, a significant challenge with this method is the need to compute the quadratic partial derivatives of the log-likelihood function. In the context of this censored model, determining the roots of Equations (5) and (6) can often be quite complex. Therefore, the expectation–maximization algorithm, as introduced in [18], is employed to address this issue and facilitate the calculation of MLEs. The EM algorithm ensures convergence to a stable point, requiring only the pseudo log-likelihood function of the complete sample to reach its maximum value, thereby effectively resolving the aforementioned challenges.
The EM algorithm consists of two main steps. The E-step entails calculating the expectation of the log-likelihood function using censored data based on the current parameter estimates. The M-step involves maximizing the log-likelihood function to obtain updated parameter estimates.
Let Z = Z 1 , , Z r , , Z m represent the censored samples. Each Z i is a 1 × R i vector recorded as Z i = Z i 1 , Z i 2 , , Z i R i , where i = 1 , 2 , , m . The variable X = X 1 , , X r , , X m denotes the observed sample, and  Z i signifies the censored data following the failure of X i . We define Q = X , Z to represent the complete sample. The vector x ^ corresponds to the values x 1 , , x r , x r + 1 , , x m associated with X. The log-likelihood function L c = Q , α , λ for the complete sample is given as follows:
L c α , λ , Q = n ln ( α λ ) 1 + α i = 1 m ln ( λ x i + 1 ) 1 + α i = 1 m k = 1 R i ln ( 1 + λ z i k ) .
  • E-step
The pseudo log-likelihood function is expressed as follows:
L s α , λ = n ln ( α λ ) ( 1 + α ) i = 1 m ln ( λ x i + 1 ) ( 1 + α ) i = 1 m k = 1 R i E ( ln ( 1 + λ z i k ) | z i k > x i ) ,
where
E ( ln ( 1 + λ z i k ) | z i k > x i ) = α λ 1 F ( x i ; α , λ ) x i ( λ t + 1 ) ( α + 1 ) ln λ t + 1 d t = α 1 F ( x i ; α , λ ) 0 1 / ( λ x i + 1 ) y α 1 ( ln y ) d y = A ( x i ; α , λ ) .
  • M-step
In the M-step, we focus on maximizing the likelihood function L s with respect to α and λ . Let us denote the s-th estimates of α and λ as ( α ( s ) , λ ( s ) ) . These estimates will be utilized to optimize the expression in function (8). Consequently, the updated estimate ( α ( s + 1 ) , λ ( s + 1 ) ) is then obtained by maximizing the following:
L s ( α , λ ) = n ln ( α λ ) ( 1 + α ) i = 1 m ln ( λ x i + 1 ) ( 1 + α ) i = 1 m R i A ( x i ; α ( s ) , λ ( s ) ) .
As a result, the associated likelihood equations are
L s α = n α i = 1 m ln ( λ x i + 1 ) i = 1 m R i A ( x i ; α ( s ) , λ ( s ) ) = 0
and
L s λ = n λ ( 1 + α ) i = 1 m x i λ x i + 1 = 0 .
The estimate of α at ( s + 1 ) -th stage can be characterized as follows:
α ^ ( λ ) = n i = 1 m ln ( λ x i + 1 ) + i = 1 m R i A ( x i ; α ( s ) , λ ( s ) ) .
Therefore, the maximization of (10) can be obtained by solving the following fixed-point equation:
Δ ( λ ) = λ ,
where
Δ ( λ ) = n ( 1 + α ^ ( λ ) ) i = 1 m x i λ x i + 1 .
Once the difference | λ ( s + 1 ) λ ( s ) | falls below the specified tolerance limit, the iteration process concludes. After  λ ( s + 1 ) is obtained, α ( s + 1 ) is computed as α ( s + 1 ) = α ^ ( λ ( s + 1 ) ) . To achieve the maximum likelihood estimators for both α and λ , it is essential to iteratively perform the E-step and M-step until the program converges.
The steps required for the implementation of the EM algorithm are as follows:
(1)
Choose initial values ( α ( 0 ) , λ ( 0 ) ) .
(2)
Calculate the expectation A ( x i ; α ( s ) , λ ( s ) ) .
(3)
Solve the fixed-point equation in (14).
(4)
Set ( α ( s + 1 ) , λ ( s + 1 ) ) to the solutions of step (3).
(5)
Repeat until convergence.

3. Asymptotic Confidence Intervals

According to [19], let Q represent the complete data and U represent the observed data. We denote I U ( θ ) as the observed information and I Q ( θ ) as the complete information, and  θ = ( α , λ ) . Subsequently, we introduce a new function, I Q | U ( θ ) , to represent the missing information.
I Q | U ( θ ) + I U ( θ ) = I Q ( θ ) ,
where
I Q ( θ ) = E 2 ln L c ( Q ; θ ) θ 2 .
With the i-th observed data x i , we derive the I Q | U ( i ) ( θ ) , which represents the Fisher information for censored data.
I Q | U ( i ) ( θ ) = E Z i | x ( i ) 2 ln f Z i ( z i | x ( i ) , θ ) θ 2 .
As a result, we can obtain the complete missing information as follows:
I Q | U ( θ ) = i = 1 m R i I Q | U ( i ) ( θ ) .
Additionally, we employ a numerical technique to obtain I Q ( θ ) :
I Q ( θ ) = C 11 C 12 C 21 C 22 .
The expressions for C 11 , C 12 , C 21 , and  C 22 are provided in the Appendix A.
Next, we derive the I Q | U ( i ) ( θ ) and consider its observed information, which encompasses the expectations outlined in (18):
I Q | U ( i ) ( θ ) = D 11 ( x ( i ) , α , λ ) D 12 ( x ( i ) , α , λ ) D 21 ( x ( i ) , α , λ ) D 22 ( x ( i ) , α , λ )
The representations of D 11 , D 12 , D 21 , and  D 22 are provided in the Appendix A.
By employing the two matrices referenced above, we can derive the information matrix I U ( θ ) . Subsequently, we can calculate the variances of the maximum likelihood estimates for the parameters individually. The asymptotic variance–covariance matrix of the MLEs of the parameters is given by I U 1 ( θ ^ ) .
This allows us to establish 100 ( 1 κ ) % asymptotic confidence intervals (ACIs) for the estimates, where 0 < κ < 1 . By applying the principle of asymptotic normality to the MLE of θ ^ , defined as θ ^ = α ^ , λ ^ , we recognize that θ ^ follows a normal distribution, specifically N ( θ , I U 1 ( θ ^ ) ) . Therefore, the ACIs for λ can be defined as follows:
λ ^ u κ 2 V a r ( λ ^ ) , λ ^ + u κ 2 V a r ( λ ^ ) ,
where u κ 2 denotes the upper κ 2 -th quantile of the standard normal distribution. The term V a r ( λ ^ ) refers to the principal diagonal element of I 1 ( θ ^ ) . We can similarly derive the ACIs for α .

4. Bayesian Estimation

In contrast to maximum likelihood estimation, a parameter such as θ is considered not only as an unknown deterministic quantity but also as an unknown random variable. By observing the s-th sample D s , the probability density function P ( D i | θ ) is transformed into the posterior probability P ( θ | D i ) , leading to the pursuit of Bayesian estimation. The core objective of Bayesian estimation is to derive the optimal estimation of the parameter θ through Bayesian decision theory, thereby minimizing the total expected risk.
Bayesian statistics diverges from traditional statistical methods by emphasizing the incorporation of subjective prior information regarding the parameters used for the reliability analysis process. The prior distribution is derived from the experimenter’s subjective assessment, informed by data collected prior to experimentation. While this subjective approach may appear to conflict with the objective nature of scientific inquiry, it offers significant advantages by leveraging prior knowledge alongside the principles of the likelihood function. This approach effectively addresses the limitations of subjectivity in selecting sufficient statistics for classical estimation, allowing for a more coherent integration of sample randomness and sufficiency.
We will outline the procedure for Bayesian estimation. We begin by assuming that the parameters ( α , λ ) follow independent gamma distributions, specified by parameters a , b and p , q , respectively. Furthermore, let X = X 1 : m : n , , X m : m : n denote the progressive Type-II observed samples derived from the generalized Pareto distribution.
Let x ^ = x 1 , x 2 , , x s , x s + 1 , , x m represent the value of X. The prior distribution is obtained as follows:
π α , λ = b a q p Γ ( a ) Γ ( p ) α a 1 λ p 1 e b α e q λ , a , b > 0 , p , q > 0 .
Based on the previously mentioned joint prior distribution and the likelihood distribution, the joint posterior distribution can be articulated as follows:
π α , λ | x ^ = π ( α , λ ) l ( α , λ | x ^ ) 0 0 π ( α , λ ) l ( α , λ | x ^ ) d λ d α
α m + a 1 λ m + p 1 e b α q λ i = 1 m ( λ x i + 1 ) ( α + 1 ) ( λ x i + 1 ) α R i .
Given that π ( α , λ | x ^ ) involves the solution of a double integral, it presents significant analytical challenges. Therefore, we will utilize the TK algorithm as discussed by [20] alongside the MH algorithm to derive the approximate posterior distribution expectations. Additionally, we will obtain the approximate Bayesian estimations of α and λ .

4.1. Loss Functions

In this subsection, we will examine three loss functions utilized in Bayesian statistics to enhance our understanding of their mathematical properties. These include the Squared Error Loss function (SEL), the LINEX, and the Balanced Squared Error Loss function (BSEL).
The function for SEL is as follows:
L S E L σ , θ = ( σ θ ) 2 ,
where σ is a Bayesian estimation of θ , and σ is calculated as σ S E L ^ = E ( θ | X ) .
The SEL function is a conventional loss function; however, in many practical situations, its application without consideration for weighting or the differences between overestimation and underestimation can introduce bias into the results.
In such instances, it may be beneficial to employ an asymmetric loss function. As discussed by [13], among the various options available, the LINEX loss function is often preferred, which is as follows:
L L I N E X σ , θ = ζ e k ( σ θ ) k ( σ θ ) 1
It is important to note that we assume ζ = 1 here without loss of generality. The equation includes the Bayesian estimate of θ , which is as follows:
σ L I N E X ^ = 1 k ln E ( e k θ | X )
In their work, ref. [21] introduced a balanced loss function that is commonly employed to integrate both goodness of fit and estimation accuracy into the evaluation of estimators. Let x ^ = x 1 , , x s , x s + 1 , , x m represent the observed data values of X = X 1 , , X s , X s + 1 , , X m . The formulation of the BSEL function is described as follows:
L B S E L ( σ , θ ) = γ ρ ( σ , σ * ) + ( 1 γ ) ρ ( σ , θ ) ,
where 0 γ 1 . We will select σ * as the maximum likelihood estimation of θ , and  ρ represents the SEL function as defined above. We will compute the Bayesian estimation in accordance with BSEL , which is expressed as follows:
σ B S E L ^ = 1 γ E ( θ | X ) + γ σ * .
Due to the variability of γ , we are able to consider additional scenarios. For instance, in the context of the BSEL function under ρ , when σ * represents the maximum likelihood estimator of the parameter, if  γ = 1 , the Bayesian estimate coincides with the MLE. Conversely, when γ = 0 , the Bayesian estimate is reduced to the Bayesian estimation utilizing the SEL function.

4.2. TK Method

Firstly, the posterior expectation for y ( α , λ ) is provided by the distribution π ( α , λ | x ^ ) as follows:
E y ( α , λ ) = 0 0 y ( α , λ ) e l ( α , λ | x ^ ) + ln π ( α , λ ) π ( α , λ ) d λ d α 0 0 e l ( α , λ | x ^ ) + ln π ( α , λ ) π ( α , λ ) d λ d α ,
where l ( α , λ | x ^ ) represents the log-likelihood function and π ( α , λ ) indicates the prior density. For estimating the posterior expectation, we utilize the TK method to address the challenges associated with integrating ratio problems. To derive the explicit function for E y ( α , λ ) , we will examine the following functions:
o α , λ = ln π ( α , λ ) + l ( α , λ | x ) n
and
o * α , λ = ln y ( α , λ ) n + o ( α , λ ) .
We individually maximize the two functions, resulting in the estimates α 1 ^ , λ 1 ^ and α u ^ , λ u ^ . Subsequently, we approximate E y ( α , λ ) as follows:
E ( y ( α , λ ) ) = | y * | | | e n o y * ( α u ^ , λ u ^ ) o ( α 1 ^ , λ 1 ^ ) .
In this context, | w * | and | | denote the determinants of the negative inverse Hessian matrices associated with o ( α , λ ) and o w * ( α , λ ) , respectively.
The specific action to be taken is as follows:
o ( α , λ ) = 1 n [ m ln α + m ln λ ( 1 + α ) i = 1 m ln ( λ x i + 1 )
α i = 1 m R i ln ( λ x i + 1 ) + ( a 1 ) ln α + ( p 1 ) ln λ α b q λ ] ,
Now, we obtain the | | as follows:
| | = 2 o α 2 2 o λ 2 2 o α λ 2 o α λ α = α 1 ^ , λ = λ 1 ^ 1 ,
where 2 o α 2 , 2 o α λ , 2 o λ 2 are provided in Appendix A.
In line with the steps outlined above, the elements of * are provided in Appendix A.
As a result, we can derive the | * | as follows:
| * | = 2 o * λ 2 2 o * α 2 2 o * λ α 2 o * α λ α = α u ^ , λ = λ u ^ 1 .
Based on the calculations outlined above, we will now consider y ( α , λ ) in relation to both α and λ . The Bayes estimations are subsequently presented according to the SEL function.
α ^ S E L = | α * | | | e n [ o α * α α ^ o α 1 ^ , λ 1 ^ ] ,
λ ^ S E L = | λ * | | | e n [ o λ * λ λ ^ o α 1 ^ , λ 1 ^ ] .
Further investigation into Bayesian estimation allows us to derive estimates using the BSEL function. The specific BSEL function is articulated as follows: σ B S E L = τ σ 0 + 1 τ σ S E L , where 0 τ 1 .
In a similar manner, we derive Bayesian estimations utilizing the LINEX loss function:
α L I N E X ^ = 1 k ln | y 1 * | e n [ o y 1 * α u ^ , λ u 1 ^ o ( α 1 ^ , λ 1 ^ ) ] ,
λ L I N E X ^ = 1 k ln | y 2 * | e n [ o y 2 * α u 2 ^ , λ u 2 ^ o ( α 1 ^ , λ 1 ^ ) ] ,
We substitute w 1 ( α , λ ) with e k α and treat w 2 ( α , λ ) as e k λ . This approach enhances the Bayesian estimations of unknown parameters, providing a more comprehensive understanding of the SEL and BSEL functions, thereby increasing their applicability across various fields.

4.3. Metropolis–Hastings Algorithm

In this section, we will compare various Bayesian estimation methods by focusing on the Metropolis–Hastings algorithm. As noted by [22], the MH algorithm is a simulation-based Markov Chain Monte Carlo technique, primarily utilized for sampling from a specified probability distribution. The core idea is to construct a Markov Chain such that its steady state corresponds to the desired probability density.
To estimate the parameters using Bayesian methods, we begin by assuming that a bivariate normal distribution is appropriate for these parameters. The MH algorithm is subsequently applied to generate samples from a bivariate normal distribution. Furthermore, Bayesian estimates are derived under three different loss functions, and the highest posterior credible (HPD) intervals are established. The detailed steps of this process are outlined in Algorithm 1.
Algorithm 1 MH algorithm
Step 1: Select an original value of α , λ , which is known as θ 0 = α 0 , λ 0 .
Step 2: A proposal θ = α , λ is generated from the binary normal distribution N 2 θ n 1 , ^ , where ^ represents the variance covariance matrix, which tends to be considered as the inverse of the Fisher information matrix, and θ n 1 = α n 1 , λ n 1 .
Step 3: Calculate p = min ( 1 , π ( θ | X ) π ( θ n 1 | X ) ) , where π ( · | X ) denotes the posterior distribution of ( α , λ ) .
Step 4: Generate u ^ from a uniform distribution defined on the interval U(0,1).
Step 5: If u ^ p , then we assign θ n = θ ; otherwise, we maintain θ n = θ n 1 .
Step 6: Repeat the steps outlined above a total of M times to obtain the desired number of samples.
We exclude the initial M I number of iterative values. Following the aforementioned steps, the Bayesian estimations utilizing the SEL function can be computed as follows:
α ˜ S E L = 1 M M I n = M 0 + 1 M α n , λ ˜ S E L = 1 M M I n = M 0 + 1 M λ n ,
Based on the aforementioned steps, the Bayesian estimations under the BSEL function can be calculated as follows:
α ˜ B S E L = 1 k ln 1 M M I i = M 0 + 1 M e k α i ,
λ ˜ B S E L = 1 k ln 1 M M I i = M 0 + 1 M e k λ i ,
Additionally, we will establish the 100 ( 1 τ ) % highest posterior density (HPD) interval for λ , as defined below:
λ ( 1 ) , λ ( 2 ) .
To establish the upper and lower bounds of the interval, we will proceed with the following methodology: let λ ( 1 ) = λ ( i * ) and λ ( 2 ) = λ ( i * + ( 1 τ ) ( M M I ) ) . Here, λ ( l ) for l = 1 , 2 is defined as follows:
λ ( 2 ) λ ( 1 ) = min 1 i τ ( M M I ) λ ( i + ( 1 τ ) ( M M 0 ) ) λ ( i ) .
The HPD interval for α can be obtained in a similar manner.

5. Bayesian Prediction

In this section, we utilize available information to generate estimations for future samples, as well as to construct corresponding prediction intervals. The ability to predict future samples is considered a crucial method in fields such as industrial, clinical, and agricultural experiments.

5.1. One-Sample Prediction

To obtain a one-sample prediction, we begin by assuming that the observed sample, under a progressive Type-II censoring scheme, is represented as X = ( X 1 , , X r + 1 , , X m ) , derived from a generalized Pareto distribution. The censoring scheme is denoted as ( R 1 , , R m ) . We denote the failure times for a future sample of size K as  W 1 W 2 W K . Additionally, let W r (for 1 r K ) represent the r-th failure of the future sample, which we aim to predict based on the observed samples. We will denote the recorded values of W r as ( w r j ) , arranged in order.
Based on the function presented in Equation (2), we can derive the corresponding cumulative distribution function (CDF) as follows:
F ( W | X , α , λ ) = X r W f ( u | x , α , λ ) d u = j R r ! ( R r j ) ! j ! n = 0 j 1 ( 1 ) j 1 n ( j 1 ) ! ( j 1 n ) ! n ! ( R r n )
× [ 1 ( 1 F ( x r ) ) n R r ( 1 F ( w ) ) R r n ] .
We will obtain the predictive survival function.
H ( h | x ^ , α , λ ) = P ( w > h | x , α , λ ) P ( w > x r | x , α , λ ) = h f ( u | x , α , λ ) d u x r f ( u | x , α , λ ) d u .
Utilizing the posterior distribution of α and λ , we derive the posterior predictive cumulative distribution function ( G w | x ^ ), along with the associated survival function ( S * w | x ^ ):
G ( w | x ^ ) = α , λ F ( w | x ^ , α , λ ) π ( α , λ | x ^ ) d α d λ ,
S * ( w | x ^ ) = α , λ S ( w | x ^ , α , λ ) π ( α , λ | x ^ ) d α d λ .
Next, we will derive the Bayesian prediction interval ( L 1 , U 1 ) , which corresponds to a confidence level of 100 ( 1 τ ) % :
S * L 1 | x ^ = 1 τ 2 , S * U 1 | x ^ = τ 2 .
Following this, we can generate the predictive estimation for the future r-th ordered lifetime:
w r ^ = 0 0 H ( x r | α , λ ) π ( α , λ | x ) d λ d α ,
where
H ( x r | α , λ ) = x r u f ( u ) d u .
We have observed that the integrals mentioned above cannot be solved analytically. Utilizing the MH algorithm, we derive the predictive estimation of W r as follows:
w r ^ = 1 M M I i = M I + 1 M H ( x r | α i , λ i ) .

5.2. Two-Sample Prediction

Firstly, we consider a censored sample represented as X = ( X 1 , , X r + 1 , , X m ) , with a size of m, drawn from a population characterized by the generalized Pareto distribution. We also denote ( W 1 , , W K ) as the ordered failure times of a future sample, where W 1 W 2 W K . Additionally, we define W r as the r-th failure time from this future sample. The density function of W r is outlined as follows:
f ( w r | α , λ ) = r m ! r ! ( m r ) ! f ( w r | α , λ ) [ 1 F ( w r | α , λ ) ] m r [ F ( w r | α , λ ) ] r 1
= r m ! r ! ( m r ) ! f ( w r | α , λ ) j = 0 r 1 [ 1 F ( w r | α , λ ) ] m r + j .
Then, the posterior prediction density function of W r is as follows:
f * ( w r | x ^ ) = 0 0 π ( α , λ | x ) f ( w r | α , λ ) d α d λ .
Utilizing the MH algorithm, the approximate result can be efficiently calculated as follows:
f * ( w r | x ^ ) = 1 M M I j = M 0 + 1 M f ( w r | α j , λ j ) .
Furthermore, we derive the posterior survival function of w r , which is:
S * ( w r | x ^ ) = 0 0 S ( w r | x ) π ( α , λ | x ) , d α , d λ = 1 M M I j = M 0 + 1 M w r f ( u | α j , λ j ) d u ,
where
S ( w r | α , λ ) = w r f ( u | α , λ ) d u .
A Bayesian predictive interval for w, denoted as L 2 , U 2 , is now constructed at the 100 ( 1 τ ) % level by obtaining the solutions to the following functions:
S * L 2 | x ^ = 1 τ 2 , S * U 2 | x ^ = τ 2 .
The next step involves obtaining the predictive estimation of the r-th future ordered lifetime as follows:
w r ^ = 0 w r f * ( w r | x ^ ) w r = 0 0 0 w r f ( w r | α , λ ) π ( α , λ | x ^ ) d w r d α d λ .
The predictive estimation of W r is derived using the MH algorithm as follows:
w r ^ = 1 M M I j = M 0 + 1 M 0 w r f ( w r | α , λ ) d w r .

6. Simulation

Utilizing the methodology presented by [2], the impact of the proposed estimation and prediction techniques has been simulated and assessed. Initially, the methods are outlined in Algorithm 2:
Algorithm 2 Sample generation
Step 1: Generate U 1 , , U m from uniform distribution U 0 , 1 .
Step 2: After confirming censoring scheme R = R 1 , , R m ,
let B j = U j 1 j + R m + R m 1 + + R m j + 1 , where j = 1 , , m .
Step 3: Let K j = 1 B m j + 1 B m , where K 1 , K 2 , , K m stand the sample from U 0 , 1 by using the progressive Type-II censored scheme.
Step 4: Let X j = F 1 ( K j ) , j = 1 , , m , where F ( x ) stands for the CDF of the generalized Pareto distribution.
In our study, we obtain the required censored data from the generalized Pareto distribution as defined, denoting the data points as X ( 1 ) , , X ( r ) , , X ( m ) . We utilize the true values of parameters α , λ , specifically set at ( 0.35 , 1.5 ) . Maximum likelihood estimates are calculated using the expectation–maximization method. For Bayesian estimation and prediction purposes, we select the parameters a , b , p , q with values of ( 0.2 , 7.8 , 0.1 , 3.7 ) , respectively. Additionally, we conduct Bayesian estimations using different loss functions, namely SEL , LINEX , and BSEL , employing the TK and Metropolis–Hastings approaches.
We define the censoring model with respect to V . For instance, if m = 5 and n = 9 , then the censoring scheme would be denoted as 0 , 0 , 0 , 0 , 4 , which we record as V = ( 0 * 4 , 4 ) . We will utilize the following ten censoring schemes for our simulations.
Based on the schemes presented in Table 1, we will conduct a simulation study. The results are compiled and displayed below.
In Table 2, we present the results of our numerical experiments conducted for each simulation. We report the maximum likelihood estimation and Bayesian estimation utilizing the TK method and the Metropolis–Hastings algorithm. Additionally, we provide the mean square error (MSE) values, which serve as a basis for comparing the results.
In the table, for each censoring scheme, the estimated values are displayed in the first and third rows, while the second and fourth rows contain the corresponding MSE values. The fifth and sixth columns present the results obtained from the MH method using the BSEL function. The seventh and eighth columns illustrate the results from the MH method employing the LINEX function. The tenth and eleventh columns show the outcomes using the TK method under the BSEL function, and the twelfth and thirteenth columns provide the results from the TK method under the LINEX function.
It is noteworthy that the MH method yields a smaller MSE compared to the TK approach. In terms of MSE comparison, the Bayesian estimation of parameters under the LINEX function exhibits a lower MSE than those using the SEL and BSEL functions, although the Bayesian estimations under SEL are closer to the true values. Regarding MLEs, larger sample sizes (denoted by m and n, representing the total and observed samples, respectively) result in more accurate estimations. Overall, the findings indicate that Bayesian estimation has a clear advantage over MLEs.
In Table 3, we present the various intervals constructed using the MH algorithm, which include average credible intervals and Highest Posterior Density (HPD) intervals. These intervals are evaluated based on their average length (AL) and coverage probabilities (CP). In summary, Bayesian estimations demonstrate a clear advantage over maximum likelihood estimations. Additionally, we observe that the ALs for the HPD intervals are shorter. The results are detailed below:
In Table 4, we present the point prediction results alongside the corresponding 95% prediction intervals. These predictions are generated based on the values of y 3 , y 7 , and y 10 (collectively referred to as y p ) within a future dataset of size 10. Overall, it is observed that as p increases, the length of the prediction intervals also widens.

7. Real Data Analysis

In reference to [23], we utilize the aforementioned methods on a real dataset comprising 50 observations, see Table 5, as detailed below:
This dataset presents the cluster maxima of daily ozone concentrations exceeding 0.11 ppm during the summer months at the Pedregal station from 2002 to 2007. This research is highly relevant to efforts aimed at protecting public health in urban areas.
To conduct the analysis, we first calculate the maximum likelihood estimates for two parameters, α and λ . Following this, we perform a goodness-of-fit assessment for the generalized Pareto distribution using several practical criteria, including the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC), and Kolmogorov–Smirnov statistics (K-S). Additionally, for comparative purposes, we evaluate alternative distributions such as the Shifted Exponential Distribution (SED) and the Inverse Weibull Distribution (IWD).
The PDF of SED is as follows:
f S E D ( x | α , λ ) = 1 λ e ( x α ) λ , α , λ > 0 , x α .
The PDF of IWD is as follows:
f I W D ( x | α , λ ) = λ α x ( α + 1 ) e λ x α , α , λ > 0 , x > 0 .
The statistical results and the maximum likelihood estimates are presented in Table 6. The findings indicate that the generalized Pareto distribution is the most appropriate model.
To illustrate the aforementioned methods, we present two schemes of censored samples obtained from the dataset under progressive Type-II censoring.
Scheme 1: 0 * 15 , 3 , 0 , 1 , 1 , 0 * 17 , 3 , 0 * 2 , 2
Scheme 2: 0 * 30 , 1 , 1 , 1 , 1 , 1 , 0 * 10
The maximum likelihood estimates obtained using the expectation–maximization algorithm, the Bayesian estimations utilizing the TK and Metropolis–Hastings methods, and the Bayesian predictions are presented in Table 7, Table 8 and Table 9.
We have computed maximum likelihood estimates employing the expectation– maximization algorithm. Bayesian estimates were derived utilizing the TK method and the Metropolis–Hastings method. In the absence of prior information, all parameter settings were initialized to values close to zero.
The results for both MLEs and Bayesian estimations are presented in Table 7 through Table 8. Furthermore, Table 9 provides Bayesian point predictions along with 95% Bayesian interval predictions for variables y 2 and y 6 in future samples, with K set to 10.

8. Conclusions

In conclusion, this study examines censored data using both classical and Bayesian inference methods, specifically focusing on progressive Type-II censoring from the generalized Pareto distribution. Initially, maximum likelihood estimates are derived utilizing the expectation–maximization algorithm. Bayesian statistical methods are subsequently employed, utilizing three distinct loss functions (SEL, LINEX, and BSEL) for parameter estimation, with the TK method applied to manage the complexities associated with the posterior expectation. The Metropolis–Hastings algorithm is utilized for deriving Bayesian estimates and highest posterior density (HPD) intervals, as well as for making predictions regarding future samples. A simulation study is conducted to evaluate the effectiveness of these methodologies, alongside an analysis of a practical dataset. Furthermore, these approaches hold the potential for application to other distributions, such as the Gompertz and Weibull distributions, and may also facilitate further exploration of Bayesian prediction for the generalized Pareto distribution in the context of general progressive censoring in future research initiatives.
This study relies on the assumption that the data follow a Generalized Pareto Distribution (GPD). If the underlying data do not conform to this distribution, the estimates and predictions may be inaccurate. The use of a progressive Type-II censoring scheme may limit the generalizability of the findings. Different censoring schemes could yield different results, and the implications of using this specific scheme should be carefully considered. The effectiveness of the estimation and prediction methods may be sensitive to sample size. Small sample sizes can lead to unreliable estimates and wider credible intervals, affecting the precision of predictions.
The findings can be applied in various fields such as finance, environmental science, and reliability engineering, where modeling extreme values is crucial. The ability to make Bayesian predictions enhances decision-making under uncertainty. This work contributes to the statistical literature by providing a framework for Bayesian estimation and prediction in the context of the GPD. It encourages further exploration of Bayesian methods in extreme value theory. The limitations identified in this study open avenues for future research. For instance, exploring alternative censoring schemes, different prior distributions, or extending the model to accommodate other distributions could enhance the robustness of the findings. The ability to estimate and predict using the GPD can improve risk assessment strategies in various industries, allowing for better management of extreme events and their potential impacts.

Author Contributions

Investigation, T.Y.; supervision, W.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in [23].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

(1) The expressions for C 11 , C 12 , C 21 , and C 22 are provided below.
C 11 = E 2 L c α 2 = n α 2 ,
C 12 = C 21 = E 2 L c α λ = n λ ( α + 1 ) ,
C 22 = E 2 L c λ 2 = n α λ 2 ( α + 2 ) .
(2) The representations of D 11 , D 12 , D 21 , and D 22 are provided below.
D 11 = 1 α 2 ,
D 12 = D 21 = z i 1 + λ z i x ( i ) 1 + λ x ( i ) ,
D 22 = 1 λ 2 ( α + 1 ) z i 2 ( 1 + λ z i ) 2 + α x ( i ) 2 ( 1 + λ x ( i ) ) 2 .
(3) The elements of ∑ are provided below.
o α = 1 n m α i = 1 m ( R i + 1 ) ln ( λ x i + 1 ) + a 1 α b ,
o λ = 1 n m λ ( α + 1 ) i = 1 m x i λ x i + 1 α i = 1 m R i x i λ x i + 1 + p 1 λ q ,
2 o α 2 = 1 n m α 2 a 1 α 2 ,
2 o λ 2 = 1 n m λ 2 + i = 1 m ( α + 1 ) x i 2 ( λ x i + 1 ) 2 + α i = 1 m R i x i 2 ( λ x i + 1 ) 2 p 1 λ 2 ,
(4) The elements of * are provided below.
o * α = o α + y α n y ( α , λ ) ,
o * λ = o λ + y λ n y ( α , λ ) ,
2 o * α 2 = 2 o α 2 + 1 n y ( α , λ ) y α α ( y α ) 2 [ y ( α , λ ) ] 2 ,
2 o * λ 2 = 2 o λ 2 + 1 n y ( α , λ ) y λ λ ( y λ ) 2 [ y ( α , λ ) ] 2 ,
2 o * α λ = 2 o α λ + 1 n y ( α , λ ) y α λ ( y α y λ ) [ y ( α , λ ) ] 2 = 2 o * λ α ,

References

  1. Hader, W.M.J. Estimation of Parameters of Mixed Exponentially Distributed Failure Time Distributions from Censored Life Test Data. Biometrika 1958, 45, 504–520. [Google Scholar]
  2. Aggarwala, R.; Balakrishnan, N. Some Properties of Progressive Censored Order Statistics from Arbitrary and Uniform Distributions with Applications to Inference and Simulation. J. Stat. Plan. Inference 1998, 70, 35–49. [Google Scholar] [CrossRef]
  3. Balakrishnan, N.; Cramer, E. Linear Estimation in Progressive Type-II Censoring; Springer: New York, NY, USA, 2014. [Google Scholar]
  4. Li, Q.; Wu, D. Bayesian Analysis of Rayleigh Distribution under Progressive Type-II Censoring. J. Shanghai Polytech. Univ. 2019, 36, 114–117. [Google Scholar]
  5. Holmes, J.D.; Moriarty, W.W. Application of the Generalized Pareto Distribution to Extreme Value Analysis in Wind Engineering. J. Wind. Eng. Ind. Aerodyn. 1999, 83, 1–10. [Google Scholar] [CrossRef]
  6. Brabson, B.B.; Palutikof, J.P. Tests of the Generalized Pareto Distribution for Predicting Extreme Wind Speeds. J. Appl. Meteorol. 2000, 39, 1627–1641. [Google Scholar] [CrossRef]
  7. Bermudez, P.; Kotz, S. Parameter Estimation of the Generalized Pareto Distribution—Part I. J. Stat. Plan. Inference 2010, 140, 1353–1373. [Google Scholar] [CrossRef]
  8. Juárez, S.F.; Schucany, W.R. Robust and Efficient Estimation for the Generalized Pareto Distribution. Extremes 2004, 7, 237–251. [Google Scholar] [CrossRef]
  9. Cheng, C.H. Statistical Inference on Generalized Pareto Distribution with Progressive Type-I Censoring Scheme. J. Lanzhou Univ. 2013, 50, 260–265. [Google Scholar]
  10. Bermudez, P.; Turkman, M. Bayesian Approach to Parameter Estimation of the Generalized Pareto Distribution. Test 2003, 12, 259–277. [Google Scholar] [CrossRef]
  11. Hu, X.; Gui, W. Bayesian and Non-Bayesian Inference for the Generalized Pareto Distribution Based on Progressive Type-II Censored Sample. Mathematics 2018, 6, 319. [Google Scholar] [CrossRef]
  12. Pandey, H.; Rao, A.K. Bayesian Estimation of the Shape Parameter of a Generalized Pareto Distribution under Asymmetric Loss Functions. Hacet. Univ. Bull. Nat. Sci. Eng. 2009, 38, 69–83. [Google Scholar]
  13. Srivastava, R.S.; Kumar, V.; Rao, A.K. Bayesian Estimation of Shape Parameter and Reliability Function of Generalized Pareto Distribution Using the Linex Loss Function with Censoring. Hacet. J. Math. Stat. 2004, 14, 81–93. [Google Scholar]
  14. Gettinby, G.D.; Sinclair, C.D.; Power, D.M.; Brown, R.A. An Analysis of the Distribution of Extremes in Indices of Share Returns in the US, UK and Japan from 1963 to 2000. Int. J. Financ. Econ. 2010, 11, 97–113. [Google Scholar] [CrossRef]
  15. Pakyari, R.; Nia, K.R. Testing goodness-of-fit for some lifetime distributions with conventional Type-I censoring. Commun. Stat. Simul. Comput. 2017, 46, 2998–3009. [Google Scholar] [CrossRef]
  16. Feroze, N.; Aslam, M.; Saleem, A. Bayesian Estimation and Prediction of Generalized Pareto Distribution Based on Type II Censored Samples. J. Stat. 2015, 22, 139–165. [Google Scholar]
  17. Gauderman, W.J.; Navidi, W. A Monte Carlo Newton-Raphson Procedure for Maximizing Complex Likelihoods on Pedigree Data. Comput. Stat. Data Anal. 2001, 35, 395–415. [Google Scholar] [CrossRef]
  18. Dempster, A.P. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. 1977, 39, 1–38. [Google Scholar] [CrossRef]
  19. Louis, T.A. Finding the Observed Information Matrix when Using the EM Algorithm. J. R. Stat. Soc. Ser. B 1982, 44, 226–233. [Google Scholar] [CrossRef]
  20. Tierney, L.; Kadane, J.B. Accurate Approximations for Posterior Moments and Marginal Densities. J. Am. Stat. Assoc. 1986, 81, 82–86. [Google Scholar] [CrossRef]
  21. Jozani, M.J.; Marchand, É.; Parsian, A. Bayesian and Robust Bayesian Analysis under a General Class of Balanced Loss Functions. Stat. Pap. 2012, 53, 51–60. [Google Scholar] [CrossRef]
  22. Singh, D.P.; Tripathi, Y.M.; Rastogi, M.K.; Dabral, N. Estimation and Prediction for a Burr Type-III Distribution with Progressive Censoring. Commun. Stat.-Theory Methods 2017, 46, 9591–9613. [Google Scholar] [CrossRef]
  23. Estrada, E.G.; Alva, J. Bootstrap Goodness of Fit Test for the Generalized Pareto Distribution. Comput. Stat. Data Anal. 2009, 53, 3835–3841. [Google Scholar]
Table 1. Censoring schemes.
Table 1. Censoring schemes.
V 1 = ( 0 , 3 , 0 * 8 , 8 , 0 * 6 , 4 , 0 * 12 ) V 2 = ( 0 * 5 , 7 , 0 * 5 , 6 , 0 * 7 , 2 , 0 * 10 )
V 3 = ( 0 * 2 , 2 , 0 * 4 , 10 , 0 * 5 , 5 , 0 * 12 ) V 4 = ( 0 * 4 , 6 , 0 * 3 , 7 , 0 * 9 , 4 , 0 * 6 )
V 5 = ( 0 * 4 , 5 , 0 * 3 , 4 , 0 * 4 , 6 , 0 * 6 ) V 6 = ( 5 , 3 , 0 * 4 , 2 , 0 * 8 , 5 , 0 * 4 )
V 7 = ( 0 * 2 , 1 , 2 , 0 * 3 , 7 , 0 * 7 ) V 8 = ( 2 , 0 * 5 , 5 , 0 * 7 , 3 )
V 9 = ( 0 * 3 , 2 , 0 * 5 , 5 , 0 ) V 10 = ( 0 * 2 , 3 , 0 * 3 , 1 , 2 , 0 * 3 )
Table 2. Estimators for parameters with MSEs (in brackets) under progressive Type-II censoring. In each cell, rows 1 and 2 correspond to the parameter α , while rows 3 and 4 pertain to the parameter λ .
Table 2. Estimators for parameters with MSEs (in brackets) under progressive Type-II censoring. In each cell, rows 1 and 2 correspond to the parameter α , while rows 3 and 4 pertain to the parameter λ .
( n , m ) Scheme (V)MLEs SEL MH SEL TK
γ = 0 . 3 γ = 0 . 7 k = 2 k = 2 γ = 0 . 3 γ = 0 . 7 k = 2 k = 2
(45,30) V 1 0.35260.32080.34000.34570.35500.35020.33820.34520.34760.34260.3531
(0.0446)(0.05781)(0.0456)(0.0462)(0.0356)(0.04837)(0.0509)(0.0623)(0.0486)(0.0643)(0.0508)
1.5001.50381.49641.51041.48981.50781.50301.50181.49811.50641.4992
(0.0458)(0.0495)(0.0442)(0.0421)(0.0590)(0.0590)(0.0501)(0.0347)(0.0492)(0.0339)(0.0558)
(45,30) V 2 0.35170.34280.35620.33430.33950.35770.34180.34740.35680.34720.3577
(0.0598)(0.0334)(0.0512)(0.0413)(0.0588)(0.0477)(0.0433)(0.0523)(0.0716)(0.0438)(0.0453)
1.50701.51201.50001.48531.47451.50011.51611.49511.48551.48701.5043
(0.0633)(0.0483)(0.0535)(0.0621)(0.0370)(0.0521)(0.0535)(0.0520)(0.0594)(0.0751)(0.0300)
(42,25) V 3 0.34700.35270.36460.35860.33680.34750.34480.33610.35140.33520.3550
(0.0660)(0.0370)(0.0556)(0.0655)(0.0528)(0.0482)(0.0567)(0.0505)(0.0591)(0.0485)(0.0356)
1.50891.48741.4961.50321.49201.49251.49921.51411.50121.49501.5099
(0.03842)(0.0601)(0.0492)(0.0537)(0.0243)(0.0412)(0.0378)(0.0405)(0.0572)(0.0501)(0.0369)
(42,25) V 4 0.35890.33340.35340.36890.35160.34580.33110.35040.34320.35410.3603
(0.0575)(0.0493)(0.0599)(0.0511)(0.0414)(0.0323)(0.0324)(0.0563)(0.0464)(0.0541)(0.0571)
1.52091.50971.49511.49161.48471.51371.49361.48151.48921.50451.5035
(0.0507)(0.0477)(0.0415)(0.0493)(0.0568)(0.0493)(0.0568)(0.0700)(0.0576)(0.0407)(0.0470)
(35,20) V 5 0.35110.34390.32520.36100.36140.35970.35900.33990.36740.35420.3404
(0.0592)(0.0502)(0.0428)(0.0642)(0.0584)(0.0440)(0.0488)(0.0486)(0.0314)(0.0598)(0.0391)
1.50171.4801.49901.48871.48911.50821.50141.51531.49371.52091.5197
(0.0550)(0.0549)(0.0456)(0.0493)(0.0482)(0.0624)(0.0658)(0.0501)(0.0480)(0.0708)(0.0328)
(35,20) V 6 0.33710.34250.35530.347680.34280.35870.34050.33550.33480.35170.3535
(0.0540)(0.0327)(0.0395)(0.0392)(0.0409)(0.0604)(0.0488)(0.0490)(0.0757)(0.0533)(0.0628)
1.51331.51881.50781.50581.49231.5001.50091.48671.48541.50291.5125
(0.04517)(0.0458)(0.0504)(0.0557)(0.0508)(0.0633)(0.0453)(0.0477)(0.0493)(0.0418)(0.0439)
(25,15) V 7 0.34540.33190.34700.32490.33980.33480.35810.35670.35610.35520.355
(0.0473)(0.0319)(0.0607)(0.0568)(0.0523)(0.0473)(0.0490)(0.0508)(0.0557)(0.0370)(0.0454)
1.49081.48761.49661.50921.49921.49891.49231.4961.50041.49941.4987
(0.0591)(0.0367)(0.0353)(0.0270)(0.0464)(0.0719)(0.0454)(0.0536)(0.0537)(0.0609)(0.0358)
(25,15) V 8 0.33440.36300.35190.35750.34970.34280.34920.33590.35360.34700.3451
(0.0453)(0.0547)(0.0529)(0.0503)(0.0577)(0.0439)(0.0498)(0.0504)(0.0508)(0.0629)(0.0323)
1.50561.49891.50141.50251.50661.51691.4981.50891.50201.49991.4955
(0.0748)(0.0368)(0.0545)(0.0491)(0.0418)(0.0412)(0.0566)(0.0416)(0.0379)(0.0540)(0.0395)
(18,11) V 9 0.34810.37120.34900.33530.33610.33920.34620.33680.32390.34450.3530
(0.0380)(0.0655)(0.0406)(0.0476)(0.0490)(0.0447)(0.0469)(0.0323)(0.0501)(0.0567)(0.0576)
1.51571.49851.51261.51471.49561.50771.50311.48801.49911.49431.5028
(0.04027)(0.0432)(0.0420)(0.0410)(0.0312)(0.0509)(0.0425)(0.0322)(0.0504)(0.0650)(0.0474)
(18,11) V 10 0.35530.33890.34710.34120.34290.33510.34940.33330.35050.36590.3458
(0.0642)(0.0413)(0.0559)(0.0351)(0.0540)(0.0647)(0.0485)(0.0585)(0.0461)(0.0387)(0.0618)
1.50271.49791.49871.50151.51261.5121.49211.50131.50161.49921.5124
(0.0632)(0.0296)(0.0384)(0.0443)(0.0602)(0.0378)(0.04785)(0.0546)(0.0575)(0.0436)(0.0538)
Table 3. 95 % Interval estimations for α and λ . In each cell, the first row corresponds to the parameter α , while the second row pertains to the parameter λ .
Table 3. 95 % Interval estimations for α and λ . In each cell, the first row corresponds to the parameter α , while the second row pertains to the parameter λ .
( n , m ) SchemeACIsHPD
AL CP AL CP
(45,30) V 1 2.16360.76821.05950.8872
1.16550.76561.03950.9367
(45,30) V 2 2.33940.78490.05150.8834
1.75770.88741.05660.6837
(42,25) V 3 2.02460.94420.80700.7878
2.41790.72500.72430.9276
(42,25) V 4 1.65230.71350.93730.8765
2.28420.91600.52141.0472
(35,20) V 5 2.55200.71810.52980.8027
1.94820.79730.55240.9058
(35,20) V 6 1.81210.74100.99590.8625
1.74460.78430.72570.8592
(25,15) V 7 1.91550.72930.00140.7849
2.16020.74680.69980.9901
(25,15) V 8 1.88950.76970.66820.7958
1.75370.80290.86050.4644
(18,11) V 9 2.53410.83860.61610.8216
2.06190.75640.69150.7681
(18,11) V 10 2.14270.80700.79530.7851
1.91520.81861.29480.7192
Table 4. Point Predictions (PPs) and 95 % Prediction Intervals (PIs). The predictions are generated based on the values of y 3 , y 7 , and y 10 (collectively referred to as y p ), utilizing a future dataset of size K = 10 .
Table 4. Point Predictions (PPs) and 95 % Prediction Intervals (PIs). The predictions are generated based on the values of y 3 , y 7 , and y 10 (collectively referred to as y p ), utilizing a future dataset of size K = 10 .
(nm)SchemepPPsPIs
(45,30) V 1 30.5487(−1.4254, 0.8255)
71.4205(−0.5438, 2.9171)
102.2475(2.2112, 2.6621)
(45,30) V 2 30.5972(−0.4371, 2.6985)
71.3989(0.3967, 3.1084)
102.2601(1.2117, 3.1575)
(42,25) V 3 30.6666(−0.5030, 0.8077)
71.5834(1.3446, 2.0823)
102.2819(1.3769, 2.7395)
(42,25) V 4 30.4173(−0.8854, 1.3781)
71.3503(1.2718, 2.4466)
102.3137(1.1238, 3.2042)
(35,20) V 5 30.4638(−0.8409, 0.6454)
71.3803(1.0270, 3.0104)
102.5017(2.0111, 3.0394)
(35,20) V 6 30.5590(−0.6948, 2.1087)
71.3765(1.3211, 2.6564)
102.5290(2.4362, 3.0446)
(25,15) V 7 30.8869(0.4920, 0.9623)
71.2612(−0.9793, 1.8674)
102.3599(2.1350, 4.0407)
(25,15) V 8 30.7029(0.5576, 2.0648)
71.4934(1.3333, 3.0181)
102.4397(2.0839, 3.0897)
(18,11) V 9 30.5261(0.3264, 1.1449)
71.4039(0.8403, 3.1833)
102.2299(2.2221, 2.8484)
(18,11) V 10 30.4196(−0.6049, 1.5668)
71.4171(1.0248, 2.8736)
102.4701(1.1667, 2.5096)
Table 5. Cluster maxima of daily ozone concentrations (excesses over 0.11 ppm) at Pedregal station (2002–2007) during the summer months.
Table 5. Cluster maxima of daily ozone concentrations (excesses over 0.11 ppm) at Pedregal station (2002–2007) during the summer months.
0.0980.0730.1080.0850.1230.1050.0250.1120.1740.025
0.0450.0390.0710.070.0010.0390.0160.030.0030.002
0.0130.0250.010.0170.0710.0730.0470.0640.0090.047
0.0620.0130.0020.0410.0480.0090.040.0470.0430.029
0.0260.050.0470.0840.0710.0220.0080.0730.040.023
Table 6. Goodness-of-fit tests and MLEs results.
Table 6. Goodness-of-fit tests and MLEs results.
Distributions α ^ mle λ ^ mle AICBICK-S
GPD0.25272.3156112.2413120.35230.1253
SED0.01120.3296164.2942160.15580.3845
IWD1.50273.4979150.1485156.26290.2987
Table 7. MLEs for parameters using the EM method and Bayesian estimation with the MH method.
Table 7. MLEs for parameters using the EM method and Bayesian estimation with the MH method.
SchemeMLESELBSELLINEX
γ = 0 . 4 γ = 0 . 8 k = 5 k = 5
10.01950.03830.03080.02320.02530.0235
1.22721.05921.12881.19251.26451.1456
20.03100.04480.03930.03380.03920.0.0371
1.09151.01331.12851.19461.26441.1448
Table 8. Bayesian estimations for α and λ using the TK method.
Table 8. Bayesian estimations for α and λ using the TK method.
SchemeSELBSELLINEX
γ = 0 . 4 γ = 0 . 8 k = 5 k = 5
10.25920.02370.020020.02450.0061
1.12111.21111.22191.28901.7346
20.01120.3296164.2942160.15580.3845
1.07191.07821.08721.12661.3988
Table 9. Bayesian predictions and 95% confidence interval predictions. The predictions are generated based on the values of y 2 , y 6 (collectively referred to as y p ), utilizing a future dataset of size K = 10 .
Table 9. Bayesian predictions and 95% confidence interval predictions. The predictions are generated based on the values of y 2 , y 6 (collectively referred to as y p ), utilizing a future dataset of size K = 10 .
SchemepPoint PredictionInterval Prediction
121.3092(7.982 ×   10 5 , 3.763)
63.9855(3.8982, 5.5566)
221.4142(4.707 ×   10 5 , 3.6202)
64.4400(4.0039, 5.6998)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, T.; Gui, W. Estimation and Bayesian Prediction of the Generalized Pareto Distribution in the Context of a Progressive Type-II Censoring Scheme. Appl. Sci. 2024, 14, 8433. https://doi.org/10.3390/app14188433

AMA Style

Ye T, Gui W. Estimation and Bayesian Prediction of the Generalized Pareto Distribution in the Context of a Progressive Type-II Censoring Scheme. Applied Sciences. 2024; 14(18):8433. https://doi.org/10.3390/app14188433

Chicago/Turabian Style

Ye, Tianrui, and Wenhao Gui. 2024. "Estimation and Bayesian Prediction of the Generalized Pareto Distribution in the Context of a Progressive Type-II Censoring Scheme" Applied Sciences 14, no. 18: 8433. https://doi.org/10.3390/app14188433

APA Style

Ye, T., & Gui, W. (2024). Estimation and Bayesian Prediction of the Generalized Pareto Distribution in the Context of a Progressive Type-II Censoring Scheme. Applied Sciences, 14(18), 8433. https://doi.org/10.3390/app14188433

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop