Next Article in Journal
On Lp-Boundedness Properties and Parseval–Goldstein-Type Theorems for a Lebedev-Type Index Transform
Next Article in Special Issue
Sequential Confidence Intervals for Comparing Two Proportions with Applications in A/B Testing
Previous Article in Journal
Solution of Fractional Differential Boundary Value Problems with Arbitrary Values of Derivative Orders for Time Series Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical Inference on the Shape Parameter of Inverse Generalized Weibull Distribution

1
Department of Mathematics and Statistics, Connecticut College, New London, CT 06320, USA
2
Shailesh J. Mehta School of Management, Indian Institute of Technology Bombay, Mumbai 400076, India
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(24), 3906; https://doi.org/10.3390/math12243906
Submission received: 15 October 2024 / Revised: 5 December 2024 / Accepted: 9 December 2024 / Published: 11 December 2024
(This article belongs to the Special Issue Sequential Sampling Methods for Statistical Inference)

Abstract

:
In this paper, we propose statistical inference methodologies for estimating the shape parameter α of inverse generalized Weibull (IGW) distribution. Specifically, we develop two approaches: (1) a bounded-risk point estimation strategy for α and (2) a fixed-accuracy confidence interval estimation method for α . For (1), we introduce a purely sequential estimation strategy, which is theoretically shown to possess desirable first-order efficiency properties. For (2), we present a method that allows for the precise determination of sample size without requiring prior knowledge of the other two parameters of the IGW distribution. To validate the proposed methods, we conduct extensive simulation studies that demonstrate their effectiveness and consistency with the theoretical results. Additionally, real-world data applications are provided to further illustrate the practical applicability of the proposed procedures.
MSC:
62L05; 62L12; 62F10; 62F12; 62F25

1. Introduction

Weibull distribution, which is named after the Swedish mathematician Waloddi Weibull [1], has been widely used in diverse disciplines to study many different issues. Some examples are food science, survival analysis, reliability engineering, and weather forecasting. One is referred to Lai et al. (2006) [2] for some detailed discussions on the applications of this model. Moreover, Weibull distribution is related to many other probability distributions, such as the exponential distribution and the Rayleigh distribution, which makes it a more versatile and flexible model.
The probability density of a random variable, S, following Weibull distribution is
f ( s ; λ , k ) = k λ s λ k 1 e ( s λ ) k , s 0 ,
with k > 0 being the shape parameter and λ > 0 being the scale parameter.
Weibull distribution accommodates increasing and decreasing failure rates. However, it does not allow non-monotone failure rates, while the non-monotone failure rate situation is very common in real life. Particularly, engineering and biological sciences often involve bathtub-shaped failure rates. Thus, a generalized Weibull distribution handling bathtub failure rate was introduced by [3], and then it was further explored and applied in [4]. They named it exponentiated Weibull distribution (EW). The added shaped parameter allows the distribution to represent increasing, decreasing, and bathtub-shaped hazard functions. Then, Mudholkar and Hutson (1996) [5] introduced a generalized form of Weibull distribution, which goes beyond the EW distribution. This generalization provides even greater flexibility than the EW distribution, with applications specifically highlighted in survival data and more complex time-to-event analyses.
Over the last few years, many variations of Weibull distribution have been introduced. Lai (2014) [6] presented a thorough discussion of generalized Weibull distributions. One can find the details, such as the density function, moment function, and hazard rate function, for many different variations of Weibull distribution, such as inverse Weibull distribution, exponentiated Weibull distribution, and Stacy’s Weibull distribution. Shama et al. (2023) [7] proposed a modified generalized Weibull distribution, providing theoretical insights and demonstrating applications across various fields. Through applications, the authors demonstrated the modified distribution’s effectiveness in fitting real-world data across fields such as reliability engineering, survival analysis, and environmental studies. Alsaggaf et al. (2024) [8] introduced a new generalization of the inverse generalized Weibull distribution, enhancing estimation methods and exploring its applications in medicine and engineering. There are other generalized distributions and their inversed versions. XLindley is a great example to mention here. Gemeay et al. (2023) [9] developed a modified XLindley distribution, detailing its properties, estimation techniques, and applications in reliability and risk assessment. Beghriche et al. (2023) [10] introduced inverse XLindley distribution, examining its properties and demonstrating its applications in modeling lifetime data.
Statistical inference is frequently performed using an existing data set or a data set that is readily accessible. However, some statistical inference problems have no fixed-sample-size solutions, according to [11]. And that is why sequential analysis is necessary and important, as it provides a way to decide on the necessary sample size and conducts statistical inference under certain accuracy requirements in the meantime. Anscombe (1952), Chow and Robbins (1965) [12,13] provided sequential procedures, creating a fixed-width confidence interval for the mean parameter of a normal distribution when the variance is also unknown. Sequential procedures for point estimation when one wants to limit the risk to its minimum or a certain value are also well developed in the literature; one may refer to Mukhopadhyay (1987), Zacks and Mukhopadhyay (2006), and Mahmoudi et al. (2019) [14,15,16]. A comprehensive review of sequential methodologies can be found in [17]. Recently, Mukhopadhyay and Banerjee (2014) [18] developed a new class of fixed-accuracy confidence interval methodologies with a confidence coefficient for the mean of a negative binomial distribution. They provided appropriate ways to develop confidence intervals that lie completely inside the parameter space when an unknown parameter under consideration is positive. In light of this wonderful idea, many other confidence interval estimation methodologies were developed. Mukhopadhyay and Zhuang (2016) [19] created a fixed-accuracy confidence interval for the parameter from Fisher’s “Nile” example; Bapat (2018) [20] discussed fixed-accuracy confidence intervals for P ( X < Y ) under bivariate exponential models, and Zhuang et al. (2020) [21] developed a fixed-accuracy confidence interval of P ( X > c ) for a two-parameter gamma population.
In this paper, we will focus on the statistical inference of the α parameter of the inverse generalized Weibull distribution, including both point estimation and confidence interval estimation procedures. If a random variable, X, follows inverse generalized Weibull distribution (IGW), then its probability distribution function (p.d.f.) can be written as follows:
f ( x , α , β , λ ) = α β λ β x ( β + 1 ) e ( λ x ) β 1 e ( λ x ) β α 1 , x 0 .
with α , β , λ > 0 . Here, α is the shape parameter that we will be focusing on, and we will discuss it in cases where β and λ are unknown and known to us. To maintain brevity and consistency, we will refer to the shape parameter as α throughout this paper.
It is in the light of [22] that we used the IGW simplified name for the distribution. This distribution is also highly related to the EW distribution we mentioned earlier in [3,4]. However, the EW distribution is exponentiated Weibull, while the distribution we investigate here is exponentiated inverse Weibull. α is an the exponentiation parameter. It is an important parameter that controls how the distribution behaves, particularly in terms of the tail heaviness and shape. To the best of our knowledge, we did not find any literature on bounded-risk point estimation or confidence interval estimation of α , especially when one wants to use a minimal required sample with the required accuracy.
The estimation of parameters from exponentiated distributions is not rare in the literature. Mudholkar and Srivastava (1993) [3] discussed the exponentiated Weibull distribution and used maximum likelihood estimation (MLE) to estimate the parameters. NAG routines (COSNCF and COSPBF) were used to solve the MLE. Balakrishnan and Sandhu (1995) [23] used MLE and the method of moments for estimating the parameters of the exponentiated exponential distribution. They also emphasized the flexibility introduced via the exponentiation parameter α , which allows for a wider variety of decay rates in modeling lifetime or reliability data. Sharma and Shanker (2010) [24] introduced the exponentiated generalized exponential distribution, which generalizes both the exponential and Weibull distributions by incorporating an exponentiation parameter α . They also discussed using MLE to estimate all parameters of the EGE distribution, including α , but the likelihood function is nonlinear and requires numerical methods. Similarly, Zhao and Zhang (2012) [25] explored exponentiated Pareto distribution and investigated the MLE method for estimating all parameters through numerical optimization techniques. In this work, we provide methodologies for estimating α through a well-designed procedure for both point estimation and confidence interval estimation. Our methods require the least amount of sample observations while satisfying the required accuracy in estimation risk or confidence level.
We want to emphasize that it is common in the literature that one may focus on estimating one parameter or even the function of one parameter while assuming the other involved parameters are known. Zhuang et al. (2020) [21] developed fixed-accuracy confidence interval estimation methodologies for a function of the rate parameter when the shape parameter is both unknown and known. Chaturvedi and Pathak (2015) [26] developed Bayes estimation strategies for the reliability function under type II censoring for a three-parameter exponentiated Weibull distribution, assuming that one of the shape parameters is unknown while the other two parameters are known. Chaturvedi et al. (2022) [27] worked on sequential estimation for an inverse Gaussian mean when the coefficient of variation is known.
The structure of the rest of the paper is as follows: In Section 2, we start with the maximum likelihood estimator of the shape parameter of the inverse generalized Weibull distribution. We propose the bounded-risk point estimation for the shape parameter, given that the other two parameters are known, in Section 2.1. We then introduce the fixed-accuracy confidence interval estimation for the shape parameter, regardless of the other two parameters, in Section 2.2. Section 2 also includes appealing properties for both inference methodologies. Section 3 discusses simulation studies to double-validate the theoretical results, as well as real data analysis for illustrative purposes. We conclude in Section 5, sharing some final thoughts.

2. Idea Formulation

Let x 1 , , x n be a random sample of size n from the IGW distribution following the p.d.f., as in (2); then, the likelihood function of the sample data is given by
L ( X , α , β , λ ) = i = 1 n f ( x i , α , β , λ ) = i = 1 n α β λ β x i ( β + 1 ) e ( λ x i ) β 1 e ( λ x i ) β α 1 ,
and then the log-likelihood function can be written as
l ( X , α , β , λ )
= n l o g α + n l o g β + n β l o g λ ( β + 1 ) i = 1 n l o g ( x i ) i = 1 n λ x i β
+ ( α 1 ) i = 1 n l o g 1 e ( λ x i ) β .
Thus, the maximum likelihood estimator (MLE) for the shape parameter α would be as follows:
α ^ = n i = 1 n l o g 1 e ( λ x i ) β

2.1. Bounded-Risk Point Estimation

Start with n independent random observations of X I G W ( α , β , λ ) ; if we decide to use the MLE for α , α ^ , then the squared-error loss function would be
L o s s ( α ^ , α ; X ) = ( α ^ α ) 2 ,
with α ^ coming from (7), and α is the shape parameter of the distribution I G W ( α , β , λ ) .
The risk associated with the squared-error loss would be
R i s k ( α ^ , α ; X ) = E [ ( α ^ α ) 2 ] .
Before we get to the explicit form of the risk function, we would first address the distribution function of α ^ .
If we denote Y = i = 1 n l o g 1 e ( λ x i ) β , then the MLE for α can be denoted as n Y . One can easily check that Y G a m m a ( n , 1 α ) . Thus, the distribution for 1 / Y would be an inverse gamma distribution, denoted as I G ( n , α ) . Using the properties of the inverse gamma distribution, when n > 2
E [ 1 / Y ] = α n 1 E [ 1 / Y 2 ] = α 2 ( n 1 ) ( n 2 )
Then, one may derive the exact distribution for α ^ Z with its pdf as the following.
f ( z ) = z n 1 n n e n / ( α z ) Γ ( n ) α n , z > 0
The following moments about α ^ Z is derived when n > 2
E [ α ^ ] = α n n 1 E [ α ^ 2 ] = α 2 n 2 ( n 1 ) ( n 2 )
Now, it is time to come back to the risk function and derive its explicit form:
R n R i s k ( α ^ , α ; X ) = E [ ( α ^ α ) 2 ] = α 2 n + 2 ( n 1 ) ( n 2 ) .
We further require the estimation to have a fixed preassigned risk bound, say δ ( > 0 ) , which means,
R n = α 2 n + 2 ( n 1 ) ( n 2 ) < δ .
Thus, upon solving the above equation in terms of n, we would require
n δ 1 2 0.25 δ 1 α 4 + 3.5 α + 0.25 δ + 0.5 δ 1 α 2 + 1 n , say
Since α is unknown, there does not exist a fixed-sample solution to determine the size of the sample, n . Thus, we would turn to a sequential procedure,
N N ( δ ) = { n > 2 : n δ 1 2 0.25 δ 1 α ^ 4 + 3.5 α ^ + 0.25 δ + 0.5 δ 1 α ^ 2 + 1 } ,
with α ^ being the MLE of the shape parameter α and δ being the fixed preassigned risk bound. The procedure is implemented in the following sense: One would start with a small number of sample observations; for example, let n = 10 and check whether the condition in (16) is satisfied. If it is satisfied, one would use the 10 observations to conduct the point estimation for α . If it were not satisfied, one would add one observation at a time until the condition would be satisfied. It is worth mentioning that n > 2 is a basic requirement due to the aforementioned derivations, such as Equation (13).
Upon termination, we would have the following estimations for α and risk, R.
α ^ N = N i = 1 N l o g 1 e ( λ x i ) β ,
R ^ N = E [ ( α ^ N α ) 2 ] .
In the following, we establish interesting first-order properties of our proposed purely sequential procedure and the associated sample size N given in (16). We also include the properties of the estimation risk in Theorem 2.
Theorem 1.
For the purely sequential methodology (16), as fixed values of α , β , λ we have as δ 0 the following:
(i) N / n P 1
(ii) E n / N 1
(iii) E N / n 1
Proof. 
Part (i): Let δ 1 2 0.25 δ 1 α ^ N 4 + 3.5 α ^ N + 0.25 δ + 0.5 δ 1 α ^ N 2 + 1 be denoted as κ ^ N . From (16), one can thus obtain the following inequality:
κ ^ N N κ ^ N 1 + ( m 1 ) I ( N = m ) + 1 ,
where m denotes the fixed pilot sample size. Now, dividing the above equation throughout by n , we get
κ ^ N n N / n κ ^ N 1 n + ( m 1 ) n I ( N = m ) + 1 n
Now, taking limits throughout (19) and noting the facts that N w.p.1., α ^ N α w.p.1. κ ^ N n , m / n = O ( 1 ) , and P ( N = m ) 0 as δ 0 completes the proof.
Part (ii): From the left-hand side of (19), we get the following inequality, w.p.1:
N n κ ^ N n
Thus,
n N n κ ^ N n × sup n 1 1 δ 1 2 0.25 δ 1 α ^ n 4 + 3.5 α ^ n + 0.25 δ + 0.5 δ 1 α ^ n 2 + 1 = W , s a y .
Now, W is integrable, given Wiener’s ergodic theorem (1939). Thus, using Part (i), we conclude that
E n N 1 ,
Part (iii): From the right-hand side of (19), we get the following inequality, w.p.1:
N n κ ^ N 1 n + ( m 1 ) n I ( N = m ) + 1 n
Thus,
N n sup n 2 δ 1 2 0.25 δ 1 α ^ n 1 4 + 3.5 α ^ n 1 + 0.25 δ + 0.5 δ 1 α ^ n 1 2 + 1 δ 1 2 0.25 δ 1 α 4 + 3.5 α + 0.25 δ + 0.5 δ 1 α 2 + 1 + m n
= W , s a y .
Again, W is integrable, given Wiener’s ergodic theorem (1939). Thus, using Part (i) we conclude,
E N n 1 .
Theorem 2.
For the purely sequential methodology (16), for fixed values of α , β , λ we have as δ 0 :
R ^ N / δ 1 .
Proof. 
First, note that N from (16) is an observable finite random variable, and the event N = n is measurable only with respect to { α ^ j ; m j n } for all fixed n m . Also, as seen earlier, α ^ n / α I G ( n , n ) for all fixed n m . Thus, this distribution is completely free from α . Now note that
R ^ N = E [ ( α ^ N α ) 2 ] = E { E [ ( α ^ N α ) 2 | N = n ] } = m n < E [ ( α ^ N α ) 2 | N = n ] P ( N = n ) = m n < E [ ( α ^ n α ) 2 | N = n ] P ( N = n ) = α 2 m n < E α ^ n α 1 2 | N = n P ( N = n )
Now, since α ^ n / α is independent of α , and hence N, we can ignore the conditional variable and thus write
R ^ N = α 2 m n < E α ^ n α 1 2 P ( N = n ) = m n < E α ^ n α 2 P ( N = n ) = m n < α 2 n + 2 ( n 1 ) ( n 2 ) = m n < δ n + 2 ( n 1 ) ( n 2 ) 1 n + 2 ( n 1 ) ( n 2 )
Thus, we have,
δ 1 R ^ N = E n + 2 ( n 1 ) ( n 2 ) 1 N + 2 ( N 1 ) ( N 2 ) = E N + 2 n + 2 ( n 1 ) ( n 2 ) ( N 1 ) ( N 2 )
Now again, utilizing the ideas from parts (ii) and (iii) of Theorem 1, we can show that N + 2 n + 2 ( n 1 ) ( n 2 ) ( N 1 ) ( N 2 ) < W , where W is assumed to be the corresponding supremum. Thus, from part (i) of Theorem 1, we finally have R ^ N / δ 1 , which completes the proof. □

2.2. Fixed-Accuracy Confidence Interval

In this section, we move on to discuss confidence interval estimation for the shape parameter of α when the other two parameters of the IGW distributions are unknown. We need to point out that α is a positive unknown parameter; thus, the traditional fixed-width confidence interval may not be appropriate. With the aim of creating a confidence interval with both lower and upper bounds above 0 while restricting the confidence level to be at a targeted level, such as 100 ( 1 η ) % , a fixed-accuracy confidence interval would be right on the point.
The fixed-accuracy confidence interval for α is of the following form:
I I ( d ) = { α ^ / d , α ^ d } ,
where d is the fixed-accuracy constant; then, we require a 100 ( 1 η ) % confidence level through
P { α I } = P { α ^ / d < α < α ^ d } 1 η
Along the lines of (10) to (12), we can further rewrite (26) as the following:
P { α I } = P { d 1 < α / α ^ < d } = P { d 1 < α Y / n < d } = P { d 1 < T < d } 1 η .
Here, T α Y / n G a m m a ( n , 1 / n ) . One can note that (27) is not related to the value of the unknown shape parameter, α . Moreover, it does not involve the other two unknown parameters, β and λ , of the IGW distribution. It is only determined through the sample size n and the preassigned requirement for the confidence level 100 ( 1 η ) % . That is, with a given fixed value of η , one can determine the number of sample sizes immediately for constructing the fixed-accuracy confidence interval, as per (25). Some selective simulation studies and real data applications will be discussed in Section 3 and Section 4.

3. Simulation Study

In this section, we first simulate a random sample from the IGW distribution, using the R software (R 4.3.1). Thus, we used a rejection sampling technique to generate samples from the IGW distribution. The basic idea of the rejection sampling method is to generate a random sample from a given density function and reject the sample observation that goes beyond the IGW distribution. The steps of generating n independent random observations of X I G W ( α , β , λ ) using rejection sampling are included in the following steps.
One may note that we are discussing the sampling method for I G W ( α = 4 , β = 2 , λ = 3 ) , and we are generating 1500 observations. It can be generalized to any IGW distributions with different parameter values, with a desired sample size. Moreover, following similar steps, one can generate a random sample from any new distribution function that is not previously defined or developed in R.
  • Step 1: Generate a random observation, X 1 , from a uniform distribution, Uniform ( a , b ) , namely g ( x ) = 1 b a I ( a < x < b ) , where a = 0 and b = 30 . The values of a and b are chosen to cover all possible observations from I G W ( α = 4 , β = 2 , λ = 3 ) . One may note that b is chosen on purpose here to ensure that the integral from 0 to b of its density function is approximately 1.
  • Step 2: Generate a random observation, U 1 , from U n i f o r m ( 0 , C g ( x ) ) , where C is the maximum value of the IGW density function, and g ( x ) is the density function of the uniform distribution. Specifically, for our simulation, C g ( x ) is the maximum value of the IGW density function, and it turns out to be 0.333 .
  • Step 3: Make the acceptance and rejection decision. Compare U 1 with the density value given by X 1 . If U 1 is less than or equal to the density function value of X 1 , then we accept X 1 as one sample observation from the IGW distribution; if not, we reject it and continue the sampling.
  • Step 4: Repeat steps 1–3 until we have our desired number of sample observations from the IGW distribution.
After following the steps above, we have successfully generated a random sample from the IGW distribution. However, we need to verify that the sample data are actually from the required IGW distribution. Thus, we make use of the standard approach via a quantile–quantile plot (Q-Q plot) to conduct the validation. This can be easily implemented in R using the function qqnorm(). However, there is no such function in R to draw a Q-Q plot using sample data from an IGW distribution. Hence, we carry out the following steps to create a Q-Q plot for comparing the sample data with an IGW distribution. Again, similar steps can be utilized to create a Q-Q plot for any other distributions, especially for the newly established ones.
  • Step 1: generate a random sample of size n from IGW distribution, X = ( x 1 , , x n ) , using the method described above, I G W ( α , β , λ ) .
  • Step 2: Obtain the theoretical quantiles, say T = ( t 1 , , t n ) . Loop through each observation from the sample, X; for each x i , obtain its theoretical quantile, t i , using the density function of the specific I G W ( α , β , λ ) distribution. As an aside, note that the integrate() function in the statistical software R is very helpful in finding such quantiles. More specifically, t i = i n t e g r a t e ( f ( x , α , β , λ ) , 0 , x i ) , where i = 1 , , n .
  • Step 3: Find the sample quantiles, say S = ( s 1 , , s n ) . Here, s i is the ratio of the number of observations that are not greater than x i and sample size n.
  • Step 4: plot the sample quantiles against the theoretical quantiles
From the plots given in Figure 1, we can see that the sample data indeed come from an IGW distribution, no matter whether the sample size is small (n = 100), medium (n = 500), or large (n = 1000).
In Table 1, Table 2, Table 3, Table 4 and Table 5, we summarized a number of selective simulation results. We included results from five different IGW distributions, IGW ( 0.5 , 2 , 0.7 ) , IGW ( 1 , 2 , 1 ) , IGW ( 2 , 2 , 4 ) , IGW ( 4 , 2 , 3 ) , and IGW ( 10 , 2 , 3 ) . From Figure 2, with five different parameter value combinations, we observe five distinct shapes of the IGW distribution. Thus, the simulation results from these five distributions would provide a solid double-validation in addition to the theoretical findings we provided in Theorems 1 and 2. Moreover, the preassigned risk upper bound, δ , is chosen in order to show the results for small, moderate, and large sample sizes.
Sample data are simulated while exactly following the simulation procedure, as discussed in the first part of Section 3. The sampling procedure is implemented following the stopping rule, as per (16). For each simulation scenario, that is, for a given combination of parameter values, say α , β , λ , and δ , we replicate the whole process in order to obtain the average of the stopping variable, N ¯ , as per (16). Also, we were able to calculate the standard error of N ¯ . Additionally, estimated risk, R ^ N , can be calculated based on data upon termination, as per (24). The pilot sample size, m, was fixed to be 10 in each case (iteration). One may note that N ¯ should be close to the optimal sample size, n . R ^ N should be close to the bounded-risk δ . The last two columns of Table 1, Table 2, Table 3, Table 4 and Table 5 show the estimated α parameter averaged over 10,000 replicates and its standard errors.
The results consistently show the following: (1) the ratio of N ¯ and n hangs around 1 with small standard errors, which doubly validates the properties in Theorem 1; (2) the ratio of the estimated risk, R N , and the theoretical risk, δ , is close to 1, which doubly validates the properties in Theorem 2.
Further, we highlight the simulation performance of our developed procedures of creating fixed-accuracy confidence intervals for α . Interestingly, as seen earlier in (27), since the rule for determining the sample size does not depend on any of the unknown parameters, one need not attempt a sequential rule to obtain the optimal sample size, N. We, hence, adopt the following simple approach in order to obtain N, with only pre-fixed η and d.
  • Step 1: fix the constant d and the required confidence level 1 η (say 0.95 , 0.99 , etc.)
  • Step 2: For every value of the possible optimal sample size, N, starting from 1, generate a single random observation from a gamma distribution with parameters N and 1 / N . Let this be denoted as T.
  • Step 3: find P { d 1 < T < d } for every value of N.
  • Step 4: stop the process when P { d 1 < T < d } exceeds 1 η , which yields the final optimal sample size, N.
In Table 6, we summarized the results from Section 2.2, which discusses the fixed-accuracy confidence interval estimation on the parameter α . According to (27), the only parameters that affect the sample size would be the fixed-accuracy measurement d and the confidence level 1 η . Table 6 includes the summarized results when the sample data are simulated from IGW ( 4 , 2 , 3 ) . For different values of d, with fixed η = 0.05 , following steps 1–4, we can determine the number of observations we need, which is shown in column 2 of Table 6. Clearly, if we fix d to be very small, one would require more samples to be able to meet the 100 ( 1 η ) % level. We replicate the sampling procedure 10 , 000 times to check on the performance of α ^ , as well as the coverage probability. As is seen from column 3, we listed the average of 10 , 000 values of α ^ , which are all close to α = 4 . Moreover, the last column showed us that the coverage probability, in all the cases, is close to the corresponding level, which was set to be 95 % . It is worth mentioning that one does not need to know the values of the other two parameters, β and λ , to create the desired confidence interval for α .
Further, Figure 3 shows the relationship between varying optimal sample sizes over a range of d values for fixed values of η . Clearly, as seen from the above table also, as d reduces, one would require more samples to still find the confidence interval at the 1 η level. This makes a lot of sense because a smaller d means a tighter confidence interval, and one needs more observations to achieve that tighter interval. Also, for the same d, a smaller η means that a larger sample is required, due to a higher confidence level.

4. Real Data Applications

In this section, we apply the proposed methodologies to a couple of practical scenarios. The first set of data we consider is the remission time (in months) of 128 patients suffering from bladder cancer. This data set has been given wide attention. The original data were brought up by Ref. [28]. Ref. [29] discussed this data set using McDonald Lomax distribution. Ref. [30] used this data set on Marshall–Olkin generalized defective Gompertz distribution. Ref. [22] showed the adequacy of inverse generalized Weibull distribution in modeling this data through some goodness of fit measures such as AIC and BIC.
It is reasonable to assume that the remission times of bladder cancer patients follow an IGW distribution. Based on the given data for the 128 patients, as well as the information from [22], we further assume the distribution is IGW ( α = 61.38 , β = 0.51 , λ = 8.19 ) . Figure 3 displays the Q-Q plot for the observed data versus the theoretical IGW distribution, given that α = 61.38 , β = 0.51 , λ = 8.19 , which confirms that this data set would be suitable for the proposed methodologies. We have also included the histogram of the observed data, and the density curve, to provide further insight.
To start with, we randomize the original data set and pretend the patients’ data come in this randomized order. Table 7 is a copy of the randomized data records.
We then further set δ = 0.08 . Now, as per (16), we implement the sequential sampling procedure. We start with the first two observations and then continue taking one observation at a time, according to Table 5. The sampling procedure is terminated at the 76th record. That is, in real-life situations, if we are collecting the data and waiting to get an estimation for α , we would only require 76 observations. And there is no need to continue sampling, which would, for sure, waste time and money. Upon termination, the estimated α is 60.09 , which is a close estimation of 61.38 that is assumed to be the population shape parameter. Again, the stopping rule (16) was created based on an upper estimation risk bound, and we set it to be 0.08 in this case.
Now, using this real data set, we illustrate the procedures we discussed for constructing a fixed-accuracy confidence interval for α .
Suppose a group of researchers has decided to create a 95 % confidence interval for α with d = 1.2 . According to (27), we search for the sample size that is needed under such requirements. The smallest sample size that is needed turns out to be 73. Figure 4 shows the relationship between the coverage probability and the required sample size, as per (27). It is clear that, as we increase the sample size N, the coverage probability goes up for fixed values of d and η . Thus, sample size 73 is the minimum number of observations that we need to achieve the targeted level, 95 % .
Now, using the data that we set up from Table 5, we just take the first 73 observations and construct the confidence interval for α . These observations give us a 95 % confidence interval as ( 59.84 . 11 , 63.07 ) . This interval covers the true value of α , which is 61.38 , by only utilizing 73 observations.
We will now work with the second data set and try to put forward a similar approach to finding the fixed accuracy confidence interval for α , as was conducted for the first one. This data set contains waiting times (in minutes) before access to customer service at 100 banks. The data set can be found to have been analyzed in [31], where the author fitted Lindley distribution to the data. However, the applicability of an IGW distribution can be seen in [22]. It is reasonable to assume that the waiting times follow an IGW distribution. Based on the given data, and the information given in [22], we further confirm the distribution as IGW ( α = 40.73 , β = 0.77 , λ = 8.55 ) . As before, we first randomize the entire data set and pretend the data come from this randomized order. Table 8 is a copy of the randomized data for reference.
Now suppose one needs to create a 95 % confidence interval for α with d = 1.2 . According to (27), we search for the sample size that is needed under such requirements. The smallest sample size that is needed turns out to be 58. Again, it can be shown that, as we increase the sample size, N, the coverage probability goes up, for fixed values of d and η . Thus, sample size 58 is the minimum number of observations that we need to achieve the targeted level, 95 % .
Now, using the data that we set up from Table 6, we just take the first 58 observations and construct the confidence interval for α . These observations give us a 95 % confidence interval as ( 39.11 , 42.72 ) . This interval covers the true value of α , which is 40.73 , by only utilizing 58 observations.

5. Conclusions

In this paper, we developed two methodologies for statistical inference of the shape parameter of the inverse generalized Weibull distribution (IGW). IGW distribution involves three parameters, α , β , and λ . The first methodology is created to build a bounded-risk point estimator of α when both β and λ are known. For this, we developed a purely sequential procedure to determine the optimal sample size, when the estimation risk is bounded by a pre-assigned value. The purely sequential rules enjoy a number of appealing properties, which were proved in theorems, as well as validated via simulation studies. The second methodology is for constructing a fixed-accuracy confidence interval for α . For this, one is not required to have any knowledge of the other two parameters, β and λ , in order to decide on the desired sample size. The sample size can be determined directly for any given confidence level 100 ( 1 η ) % and pre-fixed accuracy parameter, d.
Simulations on many traditional distributions are very easy using the statistical software R. However, the simulation for the IGW distribution is not as straightforward. Using the rejection sampling technique, we designed a detailed procedure for simulating sample data from any given distribution, regardless of its complexity. This also enabled us to conduct simulation studies and illustrate the real-world application of the newly designed methodologies.
An important consideration is how to choose appropriate parameter values, such as the risk bound δ or the fixed-accuracy measure d. For bounded-risk point estimation, one needs to set up an upper bound ( δ ) for the risk on estimating the parameter α . There is no golden rule on deciding the value of δ . Historical data may help if one can obtain any helpful historical data. Moreover, the cost of sampling and the time required to obtain sample data in different situations can also affect the decision. If one wants to have a general estimation of α , they may fix δ at a higher value so that fewer observations are needed. If one requires higher accuracy, then δ may be set up to a much smaller number. Similarly, for fixed-accuracy confidence interval estimations, one needs to decide on the values of d and η based on specific problems. Again, a tighter confidence interval means a smaller d and it requires more sample observations; a higher confidence level means a smaller η , and it requires a larger sample size.
Generalized Weibull distributions take many different forms with different numbers of parameters. Its flexibility allows its wide use in real-life problems. In the future, we will try to work on estimating all three parameters involved in this inverse generalized Weibull distribution with a minimal required sample since people may not have any prior knowledge of any of the parameters. For brevity, we want to bring up an idea using an iterative estimation approach to estimate all parameters. (1) Begin with a small pilot sample of observations from the population, say n 0 . In this initial step, set α = 1 to simplify the estimation of β and λ . Using this pilot sample, estimate β and λ through the MLE method. Let these initial estimates be denoted as β ^ 0 and λ ^ 0 . (2) Using the initial estimates β ^ 0 and λ ^ 0 , obtain the initial estimate for α . Following the stopping rule we developed on bounded-risk point estimation, we will take one more observation at a time. As new observations are sequentially added, re-estimate α with the most recent values of β and λ ; then, update β and λ based on the new α estimate. This iterative procedure is repeated until the stopping criterion is met for α . (3) Once the stopping criterion is met, the current estimates of α , β , and λ are taken as the final estimates. Similarly, for the fixed-accuracy confidence interval estimation, using the determined sample size, one can achieve MLE for β and λ first and then obtain the confidence interval for α .

Author Contributions

Conceptualization, Methodology, Writing, Y.Z.; Validation, Writing, S.R.B.; Software, Data curation, Visualization, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Weibull, W. Statistical Theory of the Strength of Materials; Generalstabens Litografiska Anstalts Förlag: Stockholm, Sweden, 1939; Volume 151, pp. 1–45. [Google Scholar]
  2. Lai, C.D.; Murthy, D.N.; Xie, M. Weibull Distributions and Their Applications; Springer Handbooks: London, UK, 2006; pp. 63–78. [Google Scholar]
  3. Mudholkar, G.S.; Srivastava, D.K. Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans. Reliab. 1993, 42, 299–302. [Google Scholar] [CrossRef]
  4. Mudholkar, G.S.; Srivastava, D.K.; Friemer, M. The exponentiated weibull family: A reanalysis of the bus-motorfailure data. Technometrics 1995, 37, 436–445. [Google Scholar] [CrossRef]
  5. Mudholkar, G.S.; Hutson, A.D. The exponentiated weibull family: Some properties and a flood data application. Commun. Stat. Theory Methods 1996, 25, 3059–3083. [Google Scholar] [CrossRef]
  6. Lai, C.D. Generalized Weibull Distributions; Springer Briefs in Statistics; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
  7. Shama, M.S.; El-Gohary, A.; Ramadan, S. Modified generalized Weibull distribution: Theory and applications. Sci. Rep. 2023, 13, 12828. [Google Scholar] [CrossRef]
  8. Alsaggaf, I.A.; Hammood, S.; Mahmoud, M.A.; Bakouch, H.S.; Ali, M.M. A new generalization of the inverse generalized Weibull distribution with different methods of estimation and applications in medicine and engineering. Symmetry 2024, 16, 1002. [Google Scholar] [CrossRef]
  9. Gemeay, A.M.; Bakr, M.E.; Tashkandy, Y.A.; Zeghdoudi, H.; El-Morshedy, M. Modified XLindley distribution: Properties, estimation, and applications. AIP Adv. 2023, 13, 095021. [Google Scholar] [CrossRef]
  10. Beghriche, A.; Tashkandy, Y.A.; Bakr, M.E.; Zeghdoudi, H.; Gemeay, A.M. The inverse XLindley distribution: Properties and application. IEEE Access 2023, 11, 47272–47281. [Google Scholar] [CrossRef]
  11. Dantzig, G.B. On the Non-Existence of Tests of “Student’s” Hypothesis Having Power Functions Independent of σ. Ann. Math. Stat. 1940, 11, 186–192. [Google Scholar] [CrossRef]
  12. Anscombe, F.J. Large-Sample Theory of Sequential Estimation. Proc. Camb. Philos. Soc. 1952, 48, 600–607. [Google Scholar] [CrossRef]
  13. Chow, Y.S.; Robbins, H. On the Asymptotic Theory of Fixed-Width Sequential Confidence Intervals for the Mean. Ann. Math. Stat. 1965, 36, 457–462. [Google Scholar] [CrossRef]
  14. Mukhopadhyay, N. Minimum Risk Point Estimation of the Mean of a Negative Exponential Distribution. Sankhyā Ser. A 1987, 49, 105–112. [Google Scholar]
  15. Zacks, S.; Mukhopadhyay, N. Bounded Risk Estimation of the Exponential Parameter in a Two-Stage Sampling. Seq. Anal. 2006, 25, 437–452. [Google Scholar] [CrossRef]
  16. Mahmoudi, E.; Roughani, G.; Khalifeh, A. Bounded Risk Estimation of the Gamma Scale Parameter in a Purely Sequential Sampling Procedure. J. Stat. Theory Appl. 2019, 18, 222–235. [Google Scholar] [CrossRef]
  17. Mukhopadhyay, N.; de Silva, B.M. Sequential Methods and Their Applications; CRC: Boca Raton, FL, USA, 2009. [Google Scholar]
  18. Mukhopadhyay, N.; Banerjee, S. Purely sequential and two-stage fixed-accuracy confidence interval estimation methods for count data from negative binomial distributions in statistical ecology: One-sample and two-sample problems. Seq. Anal. 2014, 33, 251–285. [Google Scholar] [CrossRef]
  19. Mukhopadhyay, N.; Zhuang, Y. On fixed-accuracy and bounded-accuracy confidence interval estimation problems in Fisher’s “Nile” example. Seq. Anal. 2016, 35, 516–535. [Google Scholar] [CrossRef]
  20. Bapat, S.R. Purely sequential fixed accuracy confidence intervals for P(X < Y) under bivariate exponential models. Am. J. Math. Manag. Sci. 2018, 37, 386–400. [Google Scholar]
  21. Zhuang, Y.; Hu, J.; Zou, Y. Fixed-accuracy confidence interval estimation of P(X > c) for a two-parameter gamma population. Commun. Stat. Appl. Methods 2020, 27, 625–639. [Google Scholar] [CrossRef]
  22. Amirzadi, A.; Baloui, J.E.; Deiri, E. A comparison of estimation methods for reliability function of inverse generalized Weibull distribution under new loss function. J. Stat. Comput. Simul. 2021, 91, 1–28. [Google Scholar] [CrossRef]
  23. Balakrishnan, N.; Sandhu, S.S. Exponentiated Exponential Distribution: An Overview. J. Stat. Comput. Simul. 1995, 52, 157–170. [Google Scholar]
  24. Sharma, S.; Shanker, K. Exponentiated Generalized Exponential Distribution: Properties and Estimation. J. Stat. Theory Pract. 2010, 4, 344–351. [Google Scholar]
  25. Zhao, Y.; Zhang, H. Exponentiated Pareto Distribution: Properties and Estimation. Commun. Stat. Theory Methods 2012, 41, 2074–2086. [Google Scholar]
  26. Chaturvedi, A.; Pathak, A. Bayesian Estimation Procedures for Three-parameter Exponentiated-Weibull Distribution under Squared-Error Loss Function and Type II Censoring. World Eng. Appl. Sci. J. 2015, 6, 45–58. [Google Scholar] [CrossRef]
  27. Chaturvedi, A.; Bapat, S.R.; Joshi, N. Sequential estimation of an inverse Gaussian mean with known coefficient of variation. Sankhya B 2022, 84, 402–420. [Google Scholar] [CrossRef]
  28. Lee, E.T.; Wang, J. Statistical Methods for Survival Data Analysis; John Wiley and Sons: Hoboken, NJ, USA, 2003; Volume 476. [Google Scholar]
  29. Lemonte, A.J.; Cordeiro, G.M. An extended Lomax distribution. Statistics 2013, 47, 800–816. [Google Scholar] [CrossRef]
  30. Hamdeni, T.; Gasmi, S. The Marshall–Olkin generalized defective Gompertz distribution for surviving fraction modeling. Commun. Stat. Simul. Comput. 2022, 51, 6511–6524. [Google Scholar] [CrossRef]
  31. Ghitany, M.E.; Atieh, B.; Nadarajah, S. Lindley distribution and its application. Math. Comput. Simul. 2008, 78, 493–506. [Google Scholar] [CrossRef]
Figure 1. Q-Q plot for IGW distribution when sample sizes are 100, 500, and 1000.
Figure 1. Q-Q plot for IGW distribution when sample sizes are 100, 500, and 1000.
Mathematics 12 03906 g001
Figure 2. Distribution plot of a selective range of parameters from IGW.
Figure 2. Distribution plot of a selective range of parameters from IGW.
Mathematics 12 03906 g002
Figure 3. Varying optimal sample size N over different values of d.
Figure 3. Varying optimal sample size N over different values of d.
Mathematics 12 03906 g003
Figure 4. Q-Q plot of remission time for bladder patients versus IGW ( α = 61.38 , β = 0.51 , λ = 8.19 ) ; histogram of the real data; density curve of the IGW distribution.
Figure 4. Q-Q plot of remission time for bladder patients versus IGW ( α = 61.38 , β = 0.51 , λ = 8.19 ) ; histogram of the real data; density curve of the IGW distribution.
Mathematics 12 03906 g004
Table 1. Simulation results when the sample data are simulated from IGW ( 0.5 , 2 , 0.7 ) following the stopping rules in (16).
Table 1. Simulation results when the sample data are simulated from IGW ( 0.5 , 2 , 0.7 ) following the stopping rules in (16).
δ N ¯ n s e N ¯ N ¯ / n R N R N / δ α ^ ¯ s e α ^ ¯ ¯
0.00650.72548.7770.1131.0400.0050.8710.4970.001
0.00392.92390.8300.1681.0230.0030.9320.4980.001
0.002135.267132.6480.3011.0200.0020.9650.5000.001
0.001261.575257.8150.4341.0150.0010.9740.5010.000
0.0007369.546365.0110.5211.0120.0010.9740.5010.000
0.0005512.516507.9051.4021.0090.0001.0030.5010.001
Table 2. Simulation result when the sample data are simulated from IGW ( 1 , 2 , 1 ) following the stopping rules in (16).
Table 2. Simulation result when the sample data are simulated from IGW ( 1 , 2 , 1 ) following the stopping rules in (16).
δ N ¯ n s e N ¯ N ¯ / n R N R N / δ α ^ ¯ s e α ^ ¯ ¯
0.0255.69754.2890.1291.0260.0180.8930.9880.001
0.01105.557104.3880.1901.0110.0100.9540.9920.001
0.008130.686129.4090.2141.0100.0080.9570.9940.001
0.005205.916204.4420.2761.0070.0050.9760.9960.001
0.002506.565504.4760.4401.0040.0020.9760.9990.000
0.0011007.8011004.4880.6301.0030.0010.9931.0000.000
Table 3. Simulation result when the sample data are simulated from IGW ( 2 , 2 , 4 ) following the stopping rules in (16).
Table 3. Simulation result when the sample data are simulated from IGW ( 2 , 2 , 4 ) following the stopping rules in (16).
δ N ¯ n s e N ¯ N ¯ / n R N R N / δ α ^ ¯ s e α ^ ¯ ¯
0.144.02142.6850.1181.0310.0930.9301.9650.003
0.0583.93382.7160.1731.0150.0490.9801.9790.002
0.04104.078102.7230.1961.0130.0401.0001.9840.002
0.02203.574202.7360.2771.0040.0201.0001.9900.001
0.01403.393402.7430.3971.0020.0101.0001.9940.001
0.008503.275502.7440.4401.0010.0081.0001.9950.001
Table 4. Simulation result when the sample data are simulated from IGW ( 4 , 2 , 3 ) following the stopping rules in (16).
Table 4. Simulation result when the sample data are simulated from IGW ( 4 , 2 , 3 ) following the stopping rules in (16).
δ N ¯ n s e N ¯ N ¯ / n R N R N / δ α ^ ¯ s e α ^ ¯ ¯
0.1163.049161.8720.2521.0070.1021.0203.9750.003
0.05323.060321.8730.3581.0040.0511.0203.9890.002
0.04403.029401.8740.3971.0030.0401.0003.9920.002
0.02802.248801.8740.5641.0000.0201.0003.9950.001
0.011600.9951601.8750.7990.9990.0101.0003.9970.001
0.0082000.7362001.8750.8900.9990.0081.0003.9970.001
Table 5. Simulation results when the sample data are simulated from IGW ( 10 , 2 , 3 ) following the stopping rules in (16).
Table 5. Simulation results when the sample data are simulated from IGW ( 10 , 2 , 3 ) following the stopping rules in (16).
δ N ¯ n s e N ¯ N ¯ / n R N R N / δ α ^ ¯ s e α ^ ¯ ¯
1.50068.91668.0190.1691.0131.7681.1179.8280.013
1.20085.16284.6850.1861.0061.3591.0859.8400.012
0.900113.015112.4620.2161.0050.9971.0729.8850.010
0.500201.991201.3510.2801.0030.5010.9859.9410.007
0.300335.972334.6840.3671.0040.3071.0119.9730.006
0.1001003.0081001.3500.6221.0020.0970.9649.9930.003
Table 6. Simulation results for the fixed-accuracy confidence interval when the sample data are simulated from IGW ( 4 , 2 , 3 ) following the rule in (27) with η = 0.5 .
Table 6. Simulation results for the fixed-accuracy confidence interval when the sample data are simulated from IGW ( 4 , 2 , 3 ) following the rule in (27) with η = 0.5 .
dN α ^ ¯ N c.p.
1.21174.03320.9470
1.151984.01880.9496
1.14244.01300.9485
1.095194.00890.9492
1.086504.00540.9471
1.078404.00730.9489
Table 7. Randomly ordered remission data from 128 bladder cancer patients. The original data source is given [28].
Table 7. Randomly ordered remission data from 128 bladder cancer patients. The original data source is given [28].
3.369.747.097.591.44.238.6625.746.973.022.8319.13
3.8212.073.481.7632.1525.822.622.645.715.4110.664.51
8.263.72.0217.3614.770.57.323.257.667.635.327.28
7.876.251.2623.635.325.0921.7379.052.548.3726.315.17
11.2513.293.573.527.622.872.692.0910.3415.9618.16.94
1.351.054.44.2617.1212.0236.660.42.262.690.98.65
20.285.857.3943.010.212.0311.987.938.536.932.4614.24
5.419.029.225.4917.142.2314.7646.123.3613.85.066.76
4.879.474.50.5111.794.335.341.1922.693.3110.7513.11
11.647.263.882.024.184.3434.265.621.460.080.8110.06
2.074.9816.6214.836.543.6412.632.75
Table 8. Randomly ordered waiting-time data from 100 banks. The original data source is given [31].
Table 8. Randomly ordered waiting-time data from 100 banks. The original data source is given [31].
2.921.39.88.03.38.821.431.65.79.6
7.64.26.91.513.68.621.911.59.78.2
17.34.87.11.94.018.13.111.115.44.9
11.21.93.64.713.07.17.413.12.63.5
5.78.95.012.910.713.94.923.06.36.7
9.510.919.06.24.34.25.319.918.45.5
6.17.72.74.315.43.211.011.04.438.5
8.611.20.811.917.38.613.333.114.11.3
1.84.66.27.12.120.612.418.94.712.5
4.16.218.28.94.48.827.00.813.77.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhuang, Y.; Bapat, S.R.; Wang, W. Statistical Inference on the Shape Parameter of Inverse Generalized Weibull Distribution. Mathematics 2024, 12, 3906. https://doi.org/10.3390/math12243906

AMA Style

Zhuang Y, Bapat SR, Wang W. Statistical Inference on the Shape Parameter of Inverse Generalized Weibull Distribution. Mathematics. 2024; 12(24):3906. https://doi.org/10.3390/math12243906

Chicago/Turabian Style

Zhuang, Yan, Sudeep R. Bapat, and Wenjie Wang. 2024. "Statistical Inference on the Shape Parameter of Inverse Generalized Weibull Distribution" Mathematics 12, no. 24: 3906. https://doi.org/10.3390/math12243906

APA Style

Zhuang, Y., Bapat, S. R., & Wang, W. (2024). Statistical Inference on the Shape Parameter of Inverse Generalized Weibull Distribution. Mathematics, 12(24), 3906. https://doi.org/10.3390/math12243906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop